1 purely based on points: inferior speed!

PointRCNN, cvpr19 , 3DSSD cvpr20
VoteNet, iccv19 oral, Best Paper Nominee

2 instance / voting based

Untitled

VoteNet, iccv19 oral, Best Paper Nominee points out that directly predicting bounding box parameters from surface points is challenging

⇒ The issue of the Center Feature Missing (CFM)
FSD V1, nips22
FSD V2, 23
1. 各种细节处理全部拉满
2. 为什么近处做不好？靠camera？

3 Increase reception field of submanifold sparse CNN

3.1 Submanifold sparse CNN

Submanifold sparse convolution, 2017
1. sparse convolution SC(m, n, f, s): operate on active sites
  1. small change to the convolution operation, it may bring computational benefits
  2. pooling are variants of SC(·, ·, f, s)
2. submanifold sparse convolution SSC(m, n, f): |active sites| of input & output of SSC are the same,
  1. i.e. restrict an output site to be active if and only if the site at the corresponding site in the input is active
3. Deconvolution: also keep |active sites| of input & output layers.
4. 太硬，receptive field的增大全靠pooling
Spatial pruned sparse convolution, NeurIPS 22
1. SPSC only operates on foreground sites, measured by feature magnitude [sparser than SSC]
2. Dilation on important voxels instead of no dilation during downsampling [denser than SSC]
Part-A2, tpami 21 关注instance内部辅助信息及特征的学习。
1. intra-object part locations
2. ROI-aware Point cloud pooling
3. 从了解、对比经典的角度，也值得一看。
VoxelNeXt, cvpr23
1. center-missing issue can also be simply skipped through sparse networks that have large receptive fields. How?
2. Additional 2 down-samplings & combine active voxels from the last 3 feature levels.
  1. Spatially Voxel Pruning: downsampling中适当增加receptive field
3. predicate boxes from active voxels with large scores of classification.
HEDNet, NeurIPS 23
1. hybrid detector
SAFDNet, CVPR 24 oral simple & better than FSD V2, 23
1. HEDNet + adaptive feature diffusion / dilation ⇒ fully sparse
  1. AFD enlarges receptive field effectively
  2. uniform dilation ⇒ good performance but inefficiency
  3. no dilation (submanifold spare CNN) ⇒ efficiency but inferior performance
2. solve center feature missing via nearest heatmap instead of center heatmap.

FlatFormer [4]