1. Sparse CNNs become mainframe backbone networks in 3D deep learning [10, 11, 23, 41] for its efficiency. But its representation ability is limited for prediction.
  2. To remedy it, 3D detectors of [12, 41, 49, 53] rely on dense convolutional heads for feature enhancement.

— from VoxelNeXt, cvpr23

PV-RCNN, cvpr20

PV-RCNN++

CenterPoint, cvpr21

SST, cvpr22