Abstraction

no feature fusion
1. point features are used for image instances.
no box fusion
fuses two modalities at the instance level
1. LiDAR instances are not from image instances, the 2 branches are almost separate except for the 2 instance level feature interactions, self-attention, before box prediction.
How to handle the conflict between 2 kinds of boxes? such fig 2?
1. 隐式的完成了？
Could image features help 3d box regression? or just help instance identification?

1 Introduction

Untitled

Fig 2

Untitled

Camera instance Pj, |{Pj}|=m^C
1. ⇒ {Mj} instance masks by [63]
2. 3D point cloud P are projected onto a 2D image plane to obtain the 2D points U by camera matrix
3. Uj (2d points in Mj)
4. may contain noise background
LiDAR instance Fi, |{Fi}|=m^L
1. by Connected Components Labeling (CCL) in FSD V1, nips22
2. may miss some foreground
note: there are total m=m^C+m^L instances.

Untitled