TaCOS: Task-Specific Camera Optimization with Simulation

Designing camera payloads for robots is challenging and expensive. We introduce an end-to-end optimization approach for co-designing a camera automatically with specific robotic tasks. This work leverages recent computer graphics techniques and physical camera characteristics to prototype the camera in software simulation. The main contributions of this work are:

An end-to-end camera design method that combines derivative-free and gradient-based optimization to co-design the camera with perception tasks and allows optimization of continuous and discrete camera variables
A camera simulation that includes a physics-based noise model and procedurally generated virtual environments
Validation through comparison of synthetic imagery to imagery captured with physical cameras
Demonstration of camera designs with stronger performance than common off-the-shelf alternatives

This work is a key step in simplifying the process of designing cameras for robots, where mobility and the performance of tasks are significant and the manufacturability of cameras is constrained.

Publications

• C. Yan and D. G. Dansereau, “TaCOS: Task-specific camera optimization with simulation,” arXiv preprint arXiv:2404.11031, Apr. 2024. Available here.

Citing

If you find this work useful please cite

@article{yan2024tacos,
  title = {{TaCOS}: Task-Specific Camera Optimization with Simulation},
  author = {Chengyang Yan and Donald G. Dansereau},
  journal = {arXiv preprint arXiv:2404.11031},
  URL = {https://arxiv.org/abs/2404.11031},
  year = {2024},
  month = apr
}

This work was carried out in the Robotic Imaging Group at the Australian Centre for Robotics, University of Sydney.

Acknowledgments

We would like to thank both ARIA Research Pty Ltd and the Australian government for their funding support via a CRC Projects Round 11 grant.

Themes

Learning to See, Novel Cameras.

Downloads

The code is available here.

Gallery

(click to enlarge)

We establish a virtual environment in Unreal Engine 5 (UE) with a procedural generation method and obtain renders with ray-tracing. We then add physics-based image noise to the renders and input them to the robotic perception tasks. In our optimization, we jointly optimize the camera on a fitness function with the genetic algorithm (derivative-free), as well as an object detector on an object detection loss function with a gradient-based optimizer.

We estalish a procedural generation algorithm to randomly generate virtual environments in UE5. We use an indoor environment for camera optimization and evaluation in our experiment.

An auto-agent, simulating the robot, navigates the virtual environment automatically with randomly created trajectories. The auto-agent carries a UE5 camera, capturing scene renders as it moves, and recording objects that it collides with.

We employ a UE camera to capture scene irradiance that uses ray tracing technique. The UE5 camera allows the configuration of parameters associated with cameras’ placement, optics, the image sensor, exposure settings, and multi-camera designs, as well as the configuration of algorithms in the image processing pipeline.

Additional parameters like geometric distortion and defocus blur could be added by augmenting the renderer, our noise model serves as an example.

Image noise is a fundamental limiting factor for many robotic vision tasks that is tightly coupled to camera design parameters. As the UE5 camera simulation lacks a realistic noise model, we incorporate a post-render image augmentation that introduces noise. We employ thermal and signal-dependent Poisson noise following the affine noise model (Heteroscedastic Gaussian). The noise model is calibrated using a FLIR Flea3 Camera with a Sony IMX172 image sensor and generalized to other exposure settings and image sensors.

Example of a rendered image with and without inclusion of the physically based noise model.

Our optimizer can handle the optimization of all parameters captured in the camera simulation. The genetic algorithm is designed to accommodate both continuous and discrete parameters, enhancing the generalizability of our method. In this work, we treat the image sensor as a discrete variable and select from existing a sensor catalog that we collect, allowing manufacturing and availability to be considered. In addition, we propose an alternative quantized continuous approach to discrete variables that allows consideration of the interdependencies between them.

We optimize the camera design with three perception tasks for robots: obstacle avoidance to ensure sufficient field of view, object detection as a fundamental task, and feature detection as the base for simultaneous localization and mapping.

Our simulator is validated by establishing equivalence of both low-level image statistics and high-level task performance between synthetic and captured imagery with physical cameras.

We compare the distributions of pixel intensities of our synthetic images and captured images for noise model validation using a colorbar test target. The image is captured with the FLIR camera used for noise calibration. The comparision shows equivalence in distributions despite the differences in mean intensities caused by the manufacture of the test target.

We compare the performance on the feature detection tasks of our synthetic image with images captured with 3 robotic/machine vision cameras with a test target used in literature. The graph shows the ranking of the cameras’ performance in our simulation aligns with the physical cameras, and the differences in their performance between the captured and synthetic images are consistent.

We apply our method to design cameras under two illumination conditions: a well illuminated daytime scenario (20 lux) and a low-light nighttime scenario (2 lux). The camera gain is set to a higher value under the nighttime scenario to achieve brighter images.

Comparing the FOVs and object detection performance of the camera optimized using our method and those designed by humans, for the daytime scenario. Our method designs a camera with the largest FOV, balancing FOV against effective resolution to allow the camera to detect features, obstacles and objects.

Comparison of the parameters and performance of cameras designed with our method, using fully discrete and quantized continuous schemes under daytime and nighttime scenarios, and 3 robotic/machine vision cameras. The optimized parameters are labelled with green dots while the fixed ones are labelled with gray dots. The cameras achieve compelling results compared to human-designed cameras, while the quantized continuous schemes achieves higher perfromance due to the consideration of parameters' interdependencies.