https://github.com/tusen-ai/sst

理解本文需要理解VoteNet和FSDv1的细节。

Abstract

  1. use of virtual voxels as an alternative to clustering in FSD v1
    1. ⇒ eliminating the inductive bias ⇒ better general applicability.
    2. simplify FSD v1: a more elegant and streamlined approach
      1. FSD v1: sparse CNN on voxels + point cloud network
      2. FSD v2: operate only on voxels?
        1. no. still have pointwise classification & center voting.
  2. SOTA performance on Waymo Open, Argoverse 2 and nuScenes

Introduction

FSD V1, nips22

FDS v1 employs an instance-level representation (cluster, SIR) introduces strong inductive bias, impeding the general applicability

  1. Point Feature Extraction via sparse CNN on voxels
  2. point-wise classification and center voting based on MLP
    1. foreground points ⇒ voted centers
  3. Clustering.
    1. Connected Component Labeling (CCL) is applied to the voted centers to cluster points into instances.
  4. SIR: Instance feature extraction and box prediction via “PointNet”

Treatments to Center Feature Missing

Untitled

FSDv2 replaces clusters in FSDv1 with virtual voxels (red voxels) from the voted centers (red points)

  1. virtual voxels are derived by voxelizing the voted centers.
    1. virtual: because the voted centers are artificial and not the real points obtained by sensors
  2. discarding its instance-level representation, pursuing better general applicability
  3. virtual voxel ⇒ box? no
    1. a virtual voxel may only contain a partial set of the voted centers
  4. ⇒ a light-weight sparse Virtual Voxel Mixer (VVM)
    1. aggregate the features of different virtual voxels belonging to a specific object, resulting in better features covering the whole instance
    2. VVM intuitively mimics the behavior of SIR in FSD v1, but does not dependent on explicitly generated instances