Untitled

Predicate N candidate boxes;
ARP predicates weights for each candidate, the weights are used in Destination Feature Reconstruction to refine FC, which are then cross attention again with FT ⇒ FDev.
1. the coarse predication is weighted average of N proposals.
Fsrc: feature from point cloud cropping by coarse prediction 3D box of current frame and refined prediction 3D box of previous frame (template).
1. 这和complete template，再encode出一个feature 没有区别呀？
2. 有。以前是重走整条路线，成为two-stage方法：predication 1 + predication 2，predication 2这个阶段和1的主要区别是：template变得完整了，最后也不需要再更新template。
3. 这里，补充进FDev，保持是single-stage（coarse + refine）
Destination Feature Reconstruction：加权的FC和FT的再次cross attention，只是还分别串上了coarse predication中心的坐标和template中心的坐标。
Target Knowledge Transfer (TKT), only used in training.
1. let Fdev like Fsrc by KL-Divergence

Why

Due to occlusion, and sensor quality, the point cloud is often sparse. how to handle sparse points?
1. utilize point clouds in multiple frames;
  1. MM-Track [10] alleviates the sparse issue and structures a strong template feature for matching. However, it exposes an inference speed bottleneck.
    1. ⇒ explicit point cloud completion for the template
  2. this paper ⇒ TKT: an efficient implicit point cloud completion method with slight overhead.
    1. ⇒ implicit 其实做的是 ⇒ stronger template features.
Adaptive Refine Prediction (ARP): fig.3 & 4
1. weights all predications with scores? no
2. weights all predications with “reweighted” scores from original scores and predicated logits distances.
target and proposals

employ an attention mechanism to structure the matching procedure between target and proposals, which leverages implicit similarity operation for better matching.

References

Implicit and Efficient Point Cloud Completion for 3D Single Object Tracking, IRAL23
MM-Track [10]: Beyond 3d siamese tracking: A motion-centric paradigm for 3d single object tracking in point clouds, cvpr22