Paper Plane Folder
April 2026
Building on ObjectDetector's use of Unity Sentis to run a local ML vision model on the Quest 3, I decided to take it a step further for my final self-directed project at university to develop a Perceptually-enabled Task Guidance (PTG) system, inspired by DARPA's R&D and aforementioned inspirations of EagleEye and Cyberpunk 2077's Kiroshi Optics.
To do this, I decided upon a specialised task to train my own model on; the stages of folding a paper plane.
Using Roboflow and what amounted to ~3000 frames of images taken from POV videos of me folding paper planes shot on the Quest's cameras, I annotated each step, and then trained a YOLOv11 model using Ultralytics on my MacBook.
From there, I integrated it within the same base Meta-Unity Sentis sample, and then designed and built a UI to guide a user through each step, one at a time.
Whilst this task may not be particularly useful for XR, it is proof that local, specialised ML vision models can be used with XR for solving problems; as a hypothetical, I can almost imagine a marketplace where people train models on their own activities using their smart glasses' cameras, and then can share them so other people can download their "skills"?
What an idea that would be!
Download on Meta Store