seems a little heavy.
MSCE is implemented as a single-layer Transformer encoder
起了个好名字
ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion, iccv23