https://github.com/zhanggang001/HEDNet

Introduction

  1. yielded 2.1% mAP gains over the previous best sparse detector FSD V2, 23 while being 1.3× faster.
  2. surpassed the previous best hybrid detector HEDNet, NeurIPS 23 by 2.6% mAP while being 2.1× faster;
  3. VoxelNeXt, cvpr23 directly predicts objects based on the features nearest to object centers but exhibits inferior accuracy
  4. simple.

Untitled

Related work

both SWFormer and VoxelNeXt exhibit inferior accuracy compared to hybrid detectors

Method

HEDNet, NeurIPS 23 vs SAFDNet 24 oral

Untitled

Untitled

  1. voxel feature encoder (VFE): same

  2. SSR vs SRB: same

    1. submanifold sparse residual (SSR) : HEDNet, NeurIPS 23
    2. Sparse residual (SRB): 就是SSR
      1. Most voxel-based methods [4, 7, 18] adopt sparse CNNs to extract features. These CNNs typically comprise a series of sparse residual blocks, where each block contains two submanifold sparse convolutions and a skip connection linking its input and output.
    3. SSR ⇒ result in a receptive field with limited size
  3. SED vs 3D-EDB

    1. sparse encoder-decoder (SED) : HEDNet, NeurIPS 23 靠downsampling & upsampling增加receptive field
    2. 3D Sparse encoder-decoder (EDB): 就是SED
  4. 2D DED vs 2D EBD

    1. 2D dense encoder-decoder (DED): HEDNet, NeurIPS 23
    2. 2D Sparse encoder-decoder (EDB):
      1. 3D EDB用的3D Submanifold sparse convolution, 2017
      2. 2D EDB用的2D submanifold sparse convolutions,结构一样。
  5. adaptive feature diffusion (AFD):就这一个区别,为了AFD要做voxel classification

    1. assign a larger diffusion range to voxels within the bounding boxes of large objects

    2. assigning a smaller range to voxels within the bounding boxes of small objects or the background

      Untitled

    3. hence voxel classification is necessary.

    4. tab 6: 扩散比不扩散好很多;AFD比UFD好一点,但是效率只比不扩散差一点,综合很棒。

      Untitled

    5. AFD work well on three 3D sparse backbones: HEDNet, VoxelNet, and PillarNet.

    Untitled

  6. Sparse detection head

    1. center heatmap: using the voxels within which object centers fall, as done in CenterPoint, to calculate the Gaussian heatmap during classification training led to rapid convergence of the classification loss to zero.
    2. Nearest heatmap: use the nearest nonempty voxel as the center to generate the Gaussian heatmap.
    3. tab 6: Nearest heatmap比center heatmap好太多: 55.2 ⇒ 71.0% mAPH