NeurAR: Neural Uncertainty for Autonomous 3D Reconstruction with Implicit Neural Representations

Abstract

Neural radiance field (NeRF) has achieved impressive results in high-quality 3D scene reconstruction. However, NeRF heavily relies on precise camera poses. While recent works like BARF have introduced camera pose optimization within NeRF, their applicability is limited to simple trajectory scenes. Existing methods struggle while tackling complex trajectories involving large rotations. To address this limitation, we propose CT-NeRF, an incremental reconstruction and optimization pipeline using only RGB images without pose and depth input. In this pipeline, we first propose a local-global bundle adjustment under a pose graph connecting neighboring frames to enforce the consistency between poses to escape the local minima caused by only pose consistency with the scene structure. Further, we instantiate the consistency between poses as a reprojection error constraint resulting from pixel-level correspondences between input image pairs. Through the incremental reconstruction, CT-NeRF enables the recovery of both camera poses and scene structure and is capable of handling scenes with complex trajectories. We evaluate the performance of CT-NeRF on two real-world datasets, NeRFBuster and Free-Dataset, which feature complex trajectories. Results show CT-NeRF outperforms existing methods in novel view synthesis and pose estimation accuracy.

Method

1. Incremental Optimization Pipeline CT-NeRF performs optimization incrementally by processing images frame by frame. Initially, Identity matrix poses and 3D scene structure are set, and as new frames are added, optimization gradually refines both the camera poses and scene structure. This approach allows CT-NeRF to work efficiently with complex motion trajectories without requiring prior knowledge of camera poses.

2. Local-Global Bundle Adjustment To ensure consistency across all camera poses, CT-NeRF applies a local-global bundle adjustment strategy. Local optimization is performed for each newly introduced image, refining the camera pose and scene geometry. Periodically, global optimization is applied to adjust the entire camera pose graph, maintaining consistency and preventing drift, which is particularly important in long, non-linear camera trajectories.

3. Reprojection Geometric Image Distance Constraint One of the key innovations of CT-NeRF is the reprojection constraint, which uses pixel-level correspondences between image pairs. By projecting pixels from one image to another based on estimated depth and camera pose, CT-NeRF calculates the Euclidean distance between the projected point and the actual point, providing a direct gradient signal for optimization. This geometric constraint ensures accurate alignment of camera poses and scene structures.

Results

Qualitative comparison on Free-Dataset. Trajectory comparison (left). We visualize camera poses of both estimated (blue) and COLMAP (red). Sparse 3D points for the scenes are from COLMAP. Rendered views (top right corner) and depths (bottom right corner). More results in the supplemental material.

CT-NeRF

CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory

Overview Video

Abstract

Method

Results