1 Introduction

1.1 2D vs 3D Object detection

Untitled

The disappearing problem of overlap

Untitled

LiDAR vs camera

Untitled

LiDAR vs camera: accurate spatial information and robustness to illumination changes.

#Radar,

Untitled

Visualizations of sensor noises in 3D object detection for autonomous driving. (a) Limited FOV: LiDAR installed in a front-facing manner yields a limited FOV, e.g. 120◦. (b) Object Failure: the reflection rate of some objects (e.g. the black car) is below the threshold of LiDAR thus without LiDAR points reflected. (c) Camera Occlusion: the camera module is usually vulnerable to occlusions (e.g. by dust).

key differences to 2D object detection

  1. Points in an object is sparse.

    Untitled

  2. Sparse foreground points / object locations

    1. Spatial pruned sparse convolution, NeurIPS 22: proportion of foreground points in the entire scene is extremely low (around 5%)

    2. VoxelNeXt, cvpr23: less than 1% for Car class on average of nuScenes validation set.

      Untitled

  3. Sparse points & long range scenes

    1. Argoverse 2 Dataset (200m) & Waymo Open Dataset (75m)

      Untitled

    Fig 2: Short-range point clouds (red, from KITTI [2]) v.s. long-range point clouds (blue, from Argoverse 2 [4]). The radius of the red circle is 75 meters. The sparsity quickly increases as the range extends.

  4. memory & computational burden

  5. ⇒ sparse methods but meet CMF

  6. Center Feature Missing (CFM)

  7. incomplete shapes