Abstract
A preliminary version of this work was PointRCNN, cvpr19
- 2 stages: part-aware + part-aggregation
- part-aware ⇒3D proposals and intra-object part locations
- intra-object part locations + RoI-aware pooling ⇒ effective representation to encode the geometry-specific features of each 3D proposal.
- part-aggregation ⇒ re-score the box and refine the box location
- outperforms all methods on KITTI, August 15, 2019
1 Key components
1.1 intra-object part locations

- Observation: The relative locations of foreground points provide strong cues for box scoring and localization
- name the relative locations of the 3D foreground points w.r.t. to their corresponding boxes the intra-object part locations.
- Illustration of intra-object part locations for foreground points
- Does learning this help?

1.2 Point-wise feature learning via sparse convolution
- utilize an encoder-decoder network with sparse convolution and deconvolution [30], [31] to learn discriminative point-wise features
- is more efficient and effective than the previous PointNet++, see table 1 and it is verified by all the sparse detection method.

-
Voxelized point cloud, is approximately equivalent to the raw point cloud.
- 5cm×5cm×10cm, compared to the whole 3D space (∼70m×80m×4m) of KITTI.
- ⇒ about 16,000 nonempty voxels
- center of each non-empty voxel is considered as a point ⇒Voxelized point cloud
- initial feature of each voxel is simply calculated as the mean values of point coordinates within each voxel
-
The spatial resolution of input feature volumes is 8 times downsampled by a series of sparse convolution layers with stride 2, and is then gradually upsampled to the original resolution by the sparse deconvolutions for the voxel-wise feature learning.

four scales with feature dimensions 16-32-64-64
- Submanifold sparse convolution, 2017
2 Method