SMORE S'more emoji : Simultaneous Map and Object REconstruction

3DV 2025

1Carnegie Mellon University, 2University of Adelaide, 3Villanova University
Center Image

Abstract

We present a method for dynamic surface reconstruction of large-scale urban scenes from LiDAR. Depth-based reconstructions tend to focus on small-scale objects or large-scale SLAM reconstructions that treat moving objects as outliers. We take a holistic perspective and optimize a compositional model of a dynamic scene that decomposes the world into rigidly-moving objects and the background. To achieve this, we take inspiration from recent novel view synthesis methods and frame the reconstruction problem as a global optimization over neural surfaces, ego poses, and object poses, which minimizes the error between composed spacetime surfaces and input LiDAR scans. In contrast to view synthesis methods, which typically minimize 2D errors with gradient descent, we minimize a 3D point-to-surface error by coordinate descent, which we decompose into registration and surface reconstruction steps. Each step can be handled well by off-the-shelf methods without any re-training. We analyze the surface reconstruction step for rolling-shutter LiDARs, and show that deskewing operations common in continuous time SLAM can be applied to dynamic objects as well, improving results over prior art by 10X. Beyond pursuing dynamic reconstruction as a goal in and of itself, we propose that such a system can be used to auto-label partially annotated sequences and produce ground truth annotation for hard-to-label problems such as depth completion and scene flow.

Method Overview

Center Image

Optimization over reconstruction and poses



Ground truth pose annotations

Ground Truth Pose Annotations
Ground Truth Pose Annotations
Arrow Image

Optimization

Optimized pose annotation

Optimized Pose Annotations
Optimized Pose Annotations

We can improve over ground-truth odometry and object annotations provided in flagship datasets such as NuScenes!



What is a LiDAR Sweep?

Conventional Notion of LiDAR Sweep

What the sensor really measures

Revolving LiDAR sensors do not have a global shutter. Instead, they rotate continuously and measure depth across 16-128 vertically arranged lasers, typically taking 100ms to complete a 360-degree rotation. Most datasets abstract away this continuous capture and instead provide LiDAR returns as groups of 360-degree sweeps.



Issues with conventional notion of LiDAR Sweeps

Dynamic Actor Motion Distortions

Actor Motion distortion plotted

(Left) The ego-vehicle passes a moving car that is captured at both the start and end of a sweep, leading to the driver being captured twice. (Right) We visualise each point according to the time it was acquired, with lighter = earlier and darker = later.



Incorrect Freespace

Incorrrect freespace visual
Inc freespace gif

Consider a fast-moving ego-vehicle passing by a building. In a naively motion-compensated sweep, it is possible to sample hidden surfaces of the building, leading to incorrect estimates of freespace (shown in red), which may complicate surface reconstruction.



Solution: Optimize LiDAR slices over continuous time

Modelling LiDAR as a rolling shutter sensor

Optimized Pose per LiDAR Sweep

Top Left Image

Optimized Pose per LiDAR Slice

Top Right Image

Rolling Shutter is no longer a bug, but a feature that turns LiDAR from a noisy 10Hz sensor to an accurate 4000Hz sensor!



Full Sequence Reconstructions

Depth Rendering: Extreme View Synthesis

BibTeX

@article{chodosh2024simultaneous,
        title={Simultaneous Map and Object Reconstruction},
        author={Chodosh, Nathaniel and Madan, Anish and Ramanan, Deva and Lucey, Simon},
        journal={arXiv preprint arXiv:2406.13896},
        year={2024}
      }