https://github.com/Ghostish/Open3DSOT

Introduction

Appearance vs motion

  1. Due to self-occlusion, significant appearance changes may occur in consecutive LiDAR views.
  2. the negative samples grow significantly in dense traffic scenes. it is not easy to locate a target based on its appearance alone (even for human beings).

Untitled

Due to the Siamese paradigm, previous methods have to transform the target template from the world coordinate system to its own object coordinate system. This transformation adversely breaks the motion connection between consecutive frames.

Untitled

Untitled

M_{t-1,t} is the motion: R^4;

Untitled

Targetness Prediction

segment the target points from their surrounding

  1. Similar to [45], [57], we construct a spatial-temporal point cloud P_{t−1,t} ∈ R^(Nt−1+Nt)×4 from Pt−1 and Pt by adding a temporal channel to each point and then merging them together.

    1. 4d: (x y z t)
    2. 仔细体会这里的尽早,无语义的建立关联,替代了常见tracking方案中的后期的matching
  2. a prior-targetness map St−1,t∈ R^(Nt−1+Nt)×4.

    Untitled

    1. 5d: (x y z t s)
  3. PointNet ⇒ spatial-temporal target point cloud P^~_{t−1,t} ∈ R^(Mt−1+Mt)×4, Mt is the number of target points in frame t.

    1. P^~_{t−1,t} 是中间结果,有直接的loss约束它?
      1. with GT points from frame t? Yes
      2. Bt-1中的伪点如何处理?暴力不管?

Stage I