Abstract

A preliminary version of this work was PointRCNN, cvpr19

2 stages: part-aware + part-aggregation
1. part-aware ⇒3D proposals and intra-object part locations
  1. intra-object part locations + RoI-aware pooling ⇒ effective representation to encode the geometry-specific features of each 3D proposal.
2. part-aggregation ⇒ re-score the box and refine the box location
outperforms all methods on KITTI, August 15, 2019

1 Key components

Untitled

Observation: The relative locations of foreground points provide strong cues for box scoring and localization
name the relative locations of the 3D foreground points w.r.t. to their corresponding boxes the intra-object part locations.
Illustration of intra-object part locations for foreground points
Does learning this help?

Untitled

utilize an encoder-decoder network with sparse convolution and deconvolution [30], [31] to learn discriminative point-wise features
1. is more efficient and effective than the previous PointNet++, see table 1 and it is verified by all the sparse detection method.

Untitled

Voxelized point cloud, is approximately equivalent to the raw point cloud.
1. 5cm×5cm×10cm, compared to the whole 3D space (∼70m×80m×4m) of KITTI.
  1. ⇒ about 16,000 nonempty voxels
  2. center of each non-empty voxel is considered as a point ⇒Voxelized point cloud
  3. initial feature of each voxel is simply calculated as the mean values of point coordinates within each voxel
The spatial resolution of input feature volumes is 8 times downsampled by a series of sparse convolution layers with stride 2, and is then gradually upsampled to the original resolution by the sparse deconvolutions for the voxel-wise feature learning.

four scales with feature dimensions 16-32-64-64