1 Introduction

1.1 2D vs 3D Object detection

Untitled

The disappearing problem of overlap

Untitled

LiDAR vs camera

Untitled

LiDAR vs camera: accurate spatial information and robustness to illumination changes.

#Radar,

Untitled

Visualizations of sensor noises in 3D object detection for autonomous driving. (a) Limited FOV: LiDAR installed in a front-facing manner yields a limited FOV, e.g. 120◦. (b) Object Failure: the reflection rate of some objects (e.g. the black car) is below the threshold of LiDAR thus without LiDAR points reflected. (c) Camera Occlusion: the camera module is usually vulnerable to occlusions (e.g. by dust).

key differences to 2D object detection

Points in an object is sparse.
Sparse foreground points / object locations
1. Spatial pruned sparse convolution, NeurIPS 22: proportion of foreground points in the entire scene is extremely low (around 5%)
2. VoxelNeXt, cvpr23: less than 1% for Car class on average of nuScenes validation set.
Sparse points & long range scenes
1. Argoverse 2 Dataset (200m) & Waymo Open Dataset (75m)
Fig 2: Short-range point clouds (red, from KITTI [2]) v.s. long-range point clouds (blue, from Argoverse 2 [4]). The radius of the red circle is 75 meters. The sparsity quickly increases as the range extends.
memory & computational burden
⇒ sparse methods but meet CMF
Center Feature Missing (CFM)
incomplete shapes