3D Object Detection and Tracking using center points in the bird-eye view.
1. It converts the sparse output of a backbone network into a bird-eye view dense feature map and predicts a dense heatmap of the center locations of objects
CenterPoint outperforms all previous single model method by a large margin and ranks first among all Lidar-only submissions.
1. https://paperswithcode.com/paper/center-based-3d-object-detection-and-tracking
2. 2024: BevFusion的最好；
https://github.com/tianweiy/CenterPoint

1 Introduction

1.1 Motivation

Untitled

both anchor based and our center-based method are able to detect objects accurately (top).
However, during a safety-critical left turn (bottom), anchor-based methods have difficulty fitting axis aligned bounding boxes to rotated objects.
Our center-based model accurately detect objects through rotationally invariant points.
1. why? both of them predicate box from local features.
key advantages
1. unlike bounding boxes, points have no intrinsic orientation ⇒ dramatically reduces the object detector’s search space while allowing the backbone to focus on …
2. a center-based representation simplifies downstream tasks such as tracking. If objects are points, tracklets are paths in space and time
3. point-based feature extraction enables us to design an effective two-stage refinement module that is much faster than previous approaches [44–46]

yaw角偏离很多，长宽比很大，大小极端的case会表现更好

Untitled

3D Backbone: VoxelNet [56, 66] or PointPillars [28].
1. The output of the 3D backbone is called BEV Feautres.
image-based keypoint detector to find object centers: Objects as Points, 19