Abstract

A preliminary version of this work was PointRCNN, cvpr19

  1. 2 stages: part-aware + part-aggregation
    1. part-aware ⇒3D proposals and intra-object part locations
      1. intra-object part locations + RoI-aware pooling ⇒ effective representation to encode the geometry-specific features of each 3D proposal.
    2. part-aggregation ⇒ re-score the box and refine the box location
  2. outperforms all methods on KITTI, August 15, 2019

1 Key components

1.1 intra-object part locations

Untitled

  1. Observation: The relative locations of foreground points provide strong cues for box scoring and localization
  2. name the relative locations of the 3D foreground points w.r.t. to their corresponding boxes the intra-object part locations.
  3. Illustration of intra-object part locations for foreground points
  4. Does learning this help?

Untitled

1.2 Point-wise feature learning via sparse convolution

  1. utilize an encoder-decoder network with sparse convolution and deconvolution [30], [31] to learn discriminative point-wise features
    1. is more efficient and effective than the previous PointNet++, see table 1 and it is verified by all the sparse detection method.

Untitled

  1. Voxelized point cloud, is approximately equivalent to the raw point cloud.

    1. 5cm×5cm×10cm, compared to the whole 3D space (∼70m×80m×4m) of KITTI.
      1. ⇒ about 16,000 nonempty voxels
      2. center of each non-empty voxel is considered as a point ⇒Voxelized point cloud
      3. initial feature of each voxel is simply calculated as the mean values of point coordinates within each voxel
  2. The spatial resolution of input feature volumes is 8 times downsampled by a series of sparse convolution layers with stride 2, and is then gradually upsampled to the original resolution by the sparse deconvolutions for the voxel-wise feature learning.

    Untitled

four scales with feature dimensions 16-32-64-64

  1. Submanifold sparse convolution, 2017

2 Method