MulSen-AD, cvpr25
Related work
- Datasets: MVTec 3D-AD 22 [5] and Eyecandies 22 [6].
- both provide RGB images alongside pixel-registered 3D information for all data samples,
thereby fostering the development of new, multimodal AD approaches.
- MVTec 3D-AD:a high-resolution point cloud and the corresponding RGB image
- BTF23 and M3DM23 rely on large memory banks of multimodal features. but extensive memory requirements and slow inference.
- AST23, follows a teacher-student paradigm conducive to a faster architecture.
- Yet, AST does not exploit the spatial structure of the 3D data but employs this information just as an additional input channel in a 2D network architecture.
- This results in inferior performance compared to M3DM and BTF
- the above is refer to figure 1 of CFM-cvpr24
- BTF23
- Inspired by PatchCore, BTF23 investigates the use of memory banks for 3D anomaly detection.
- The authors propose to add 3D features to the 2D features provided by a frozen convolutional model (Wide ResNet-50) to enhance anomaly detection performance.
- They test several 3D features and achieve the best results using hand-crafted descriptors extracted from Point Clouds [37].
- M3DM, cvpr23
- improved over BTF by employing rich and distinctive 2D and 3D features extracted
by frozen Transformer-based foundation models trained by self-supervision on large datasets: DINO [8] and Point-MAE [26] to extract 2D and 3D features, respectively.
- The authors also propose a learned function to fuse 2D and 3D features into multimodal features stored in memory banks alongside those computed from the individual modalities.
- CFM-cvpr24
- Using the same feature extractors as M3DM, Yet, do not employ any memory bank and, instead, propose a novel crossmodal feature mapping paradigm realized by two lightweight neural networks.
- we achieve better performance on MVTec 3D-AD while requiring way less memory and running remarkably faster.
References
- [MVTec 3D-AD]: The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization, 22
- [Eyecandies]: The eyecandies dataset for unsupervised multimodal anomaly detection and localization, accv22
- Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D
Anomaly Synthesis and A Self-Supervised Learning Network, cvpr24
- Real3D-AD: A Dataset of Point Cloud Anomaly Detection, nips23
- Image-Point cloud Fusion based Anomaly Detection using PD-REAL Dataset, 23
- CFM-cvpr24
- Shape-Guided Dual-Memory, ICML23 [memory bank]
- [BTF23]: BACK TO THE FEATURE:CLASSICAL 3D FEATURES ARE (ALMOST) ALL YOU NEED FOR 3D ANOMALY DETECTION, cvpr23w, [memory bank]
- M3DM: Multimodal Industrial Anomaly Detection via Hybrid Fusion, cvpr23 [memory bank]
- Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection, 24 [memory bank]
- DINO: Emerging properties in self-supervised vision transformers, iccv21
- (Self-DIstillation with NO Labels)
- PointMAE: Masked autoencoders for point cloud self-supervised learning, eccv22