1 Introduction
The goal is to continuously estimate the position and orientation of the object,
even in the presence of occlusions, camera motion, and changing lighting conditions.

1.1 2 approaches:
- Separate Trackers — We perform tracking by detection; we first use an object detector, and then track its output image by image.
- Joint Trackers — We do joint detection and 3D object tracking by sending 2 images (or point clouds) to a Deep Learning model.
1.2 3 ways
- feature tracking
- Multi-Object tracking
- 2d or 3d
- optical flow
1.3 2 kinds
- Online tracking
- Auto-labeling
1.4 2 Paradigms for Multi-Object tracking
matching-based vs motion-based methods
- matching-based
- extract template and search proposal features with the same embedding space, and then predict the target states by measuring the feature similarity.
- Siamese paradigm: takes the target template cropped from the previous frame and search area in the current frame as input.
- motion-based
- explicitly building the relative motion between the template and search point cloud
- motion clues acted as a reference to enhance current features with past features for prediction.
1.5 2D tracking pipeline
Given detections at 2 consecutive timesteps...