
PV: Point-based + Voxel-based feature learning methods.
- voxel-based networks: efficiently encodes multi-scale feature representations. +
- PointNet-based networks: preserves accurate location information with flexible receptive fields.
- a two step strategy: the voxel-to-keypoint 3D scene encoding + the keypoint-to-grid RoI feature abstraction

Predicted Keypoint Weighting
keypoints by the Further Point Sampling strategy.
keypoints belonging to the foreground objects should contribute more to the accurate refinement of the proposals, while the ones from the background regions should contribute less.

Keypoint-to-grid RoI Feature Abstraction for Proposal Refinement
uniformly sample 6 × 6 × 6 grid points within each 3D proposal,

Experiments
- KITTI: batch size 24, learning rate 0.01 for 80 epochs on 8 GTX 1080 Ti
- Waymo Open Dataset: batch size 64, learning rate 0.01 for 50 epochs on 32 GTX 1080 Ti
- Table 6, 7, 8 分析了Voxel CNN, keypoints, Rol-grid pooling, 以及每个特征的贡献。
References
PV-RCNN: Point-voxel feature set abstraction for 3D object detection. cvpr20
Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li.