1. 3D Object Detection and Tracking using center points in the bird-eye view.
    1. It converts the sparse output of a backbone network into a bird-eye view dense feature map and predicts a dense heatmap of the center locations of objects
  2. CenterPoint outperforms all previous single model method by a large margin and ranks first among all Lidar-only submissions.
    1. https://paperswithcode.com/paper/center-based-3d-object-detection-and-tracking
    2. 2024: BevFusion的最好;
  3. https://github.com/tianweiy/CenterPoint

1 Introduction

1.1 Motivation

1.1.1 Rotation misalignment

Untitled

Untitled

  1. both anchor based and our center-based method are able to detect objects accurately (top).
  2. However, during a safety-critical left turn (bottom), anchor-based methods have difficulty fitting axis aligned bounding boxes to rotated objects.
  3. Our center-based model accurately detect objects through rotationally invariant points.
    1. why? both of them predicate box from local features.
  4. key advantages
    1. unlike bounding boxes, points have no intrinsic orientation ⇒ dramatically reduces the object detector’s search space while allowing the backbone to focus on …
    2. a center-based representation simplifies downstream tasks such as tracking. If objects are points, tracklets are paths in space and time
    3. point-based feature extraction enables us to design an effective two-stage refinement module that is much faster than previous approaches [44–46]

优势

yaw角偏离很多,长宽比很大,大小极端的case会表现更好

2 Method

2.1 detection

Untitled

  1. 3D Backbone: VoxelNet [56, 66] or PointPillars [28].

    1. The output of the 3D backbone is called BEV Feautres.

    Untitled

  2. image-based keypoint detector to find object centers: Objects as Points, 19