The farthest until 2022, and highest mAP on KITTI till 2019

Untitled

All runtimes are measured on a desktop with an Intel i7 CPU and a 1080ti GPU.

70m × 80m detection region on the KITTI dataset

Untitled

1. Method

Untitled

1.1 Encoding / Feature Net

Untitled

The set of pillars will be mostly empty due to sparsity of the point cloud, and the non-empty pillars will in general have few points in them. This sparsity is exploited by imposing a limit both on the number of non-empty pillars per sample (P) and on the number of points per pillar (N) to create a dense tensor of size (D, P, N)

If a sample or pillar holds too much data to fit in this tensor the data is randomly sampled. Conversely, if a sample or pillar has too little data to populate the tensor, zero padding is applied.

D = [x,y,z,r,Xc,Yc,Zc,Xp,Yp]

Untitled

Encoding: pointnet

PointNet basically applies to each point, a linear layer followed by BatchNorm and ReLU to generate high-level features, which in this case is of dimension (C,P,N). This is followed by a max pool operation which converts this (C,P,N) dimensional tensor to a (C,P) dimensional tensor.

1.2 Backbone

Untitled

1.3 SSD Head

Untitled