Coverage Optimization for Camera View Selection

Stanford University
🎉 Accepted to CVPR 2026 🎉
*Equal Contribution

Abstract

What makes a good viewpoint? The quality of the data used to learn 3D reconstructions is crucial for enabling efficient and accurate scene modeling. We study the active view selection problem and develop a principled analysis that yields a simple and interpretable criterion for selecting informative camera poses. Our key insight is that informative views can be obtained by minimizing a tractable approximation of the Fisher Information Gain, which reduces to favoring viewpoints that cover geometry that has been insufficiently observed by past cameras. This leads to a lightweight coverage-based view selection metric that avoids expensive transmittance estimation, is robust to noise and training dynamics, and can be rendered at 85 FPS. We call our pipeline COVER (Camera Optimization for View Exploration and Reconstruction). We integrate our method into the Nerfstudio framework and evaluate it on real datasets within fixed and embodied data acquisition scenarios. Across multiple datasets and radiance-field baselines, our method consistently improves reconstruction quality compared to state-of-the-art active view selection methods.

Key Questions

Q1

Is there a natural interpretation of information gain?

Humans naturally collect sequential image datasets that facilitate 3D reconstruction. This observation suggests that there exists a natural interpretation to the information gain problem. We make this explicit by showing close connections between Fisher Information Gain and coverage.

Q2

Does coverage outperform information-theoretic baselines?

COVER consistently outperforms FisherRF and random selection across 15 real scenes in PSNR, SSIM, and LPIPS—both on human-captured datasets that already have good natural coverage, and in embodied scenarios where the camera can only move within a limited vicinity under a fixed time budget.

Q3

Can we avoid expensive transmittance computation?

Yes. COVER abstracts away transmittance effects, relying only on Gaussian projection onto the image plane—no rasterization required.

Q4

Is the metric fast enough for real-time use?

The coverage metric renders at 85 FPS (including color), enabling real-time visualization. Under a fixed time budget, this speed facilitates more comprehensive search and scalable view selection.

Pipeline

COVER pipeline overview

COVER operates in three stages:

  1. Coverage View Metric: For each Gaussian primitive, we compute a lightweight coverage score measuring how well it has been observed by existing training cameras. Derived from a tractable approximation of the Fisher Information Gain, this metric stores a discrete grid over the unit sphere for each Gaussian, recording the frequency of observation from different viewing directions incident on that primitive from training cameras. Although we iterate through the entire training dataset to compute these frequencies, the process does not require rasterization—only projection of Gaussians onto the 2D image plane, which is very inexpensive and allows COVER to scale to large scenes. This also means the metric is non-myopic, as it accounts for all previously selected views when scoring candidates. The metric can be rendered as an image at 85 FPS (including color rendering) for real-time visualization.
  2. Next Best View Selection: Candidate viewpoints are ranked by how much under-covered geometry they observe. Views that cover large regions of the scene insufficiently seen by past cameras are selected and added to the training set.
  3. Incremental Training: The 3D Gaussian Splatting model is incrementally trained with the newly added views, progressively improving reconstruction quality.

Results

We benchmark COVER against FisherRF and a random baseline across 15 real scenes from the Tanks and Temples, MipNeRF360, and custom captured datasets. Each scene is seeded with 10 initial views, and a new view is selected every 200 gradient steps during training. COVER consistently outperforms all feasible baselines in PSNR, SSIM, and LPIPS, and in some scenes approaches the performance of Splatfacto trained with all views. Use the sliders below to compare renders from evaluation viewpoints not seen during training.

COVER vs Ground Truth

Ground Truth
COVER

FisherRF vs Ground Truth

Ground Truth
FisherRF

Random vs Ground Truth

Ground Truth
Random

BibTeX

@inproceedings{chen2026cover,
  author    = {Timothy Chen and Adam Dai and Maximilian Adang and Grace Gao and Mac Schwager},
  title     = {Coverage Optimization for Camera View Selection},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}