Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting

Segmented light field images can serve as a powerful representation in autonomous driving tasks: for example, object pose tracking. Segment Anything Model 2 (SAM 2) allows producing semantically meaningful segments for monocular images and videos. In this work, we introduce:

A novel light field segmentation method that adapts SAM 2 to the light field domain.
Segmentation refinement, a two-step method of light field segmentation: disparity-based mask propagation followed by reprompting of SAM 2.
Semantic occluding, a technique that uses latent semantic features of the SAM 2 image encoder model to estimate the occluded regions of the segments to refine the prompts provided to the model.

We show that our method produces semantically accurate and spatio-angularly consistent segments, avoids excessive oversegmentation of objects, and achieves higher performance than SAM 2 video tracking while being 7 times faster.

Publications

• N. Goncharov and D. G. Dansereau, “Segment anything in light fields for real-time applications via constrained prompting,” in Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025. Available here.

Citing

If you find this work useful please cite

@inproceedings{goncharov2025segment,
  title = {Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting},
  author = {Nikolai Goncharov and Donald G. Dansereau},
  booktitle = {Winter Conference on Applications of Computer Vision ({WACV}) Workshops},
  year = {2025}
}

This work was carried out in the Robotic Imaging Group at the Australian Centre for Robotics, University of Sydney.

Acknowledgments

This research was supported in part by funding from Ford Motor Company.

Themes

Novel Cameras, Algorithms & Architectures, Learning to See

Downloads

The code for the work is available here.

Gallery

(click to enlarge)

The quality of our segmentation is similar to the SAM 2 video segment tracking baseline, while substantially decreasing the inference time.