MulSen-AD, cvpr25

Related work

  1. Datasets: MVTec 3D-AD 22 [5] and Eyecandies 22 [6].
    1. both provide RGB images alongside pixel-registered 3D information for all data samples, thereby fostering the development of new, multimodal AD approaches.
    2. MVTec 3D-AD:a high-resolution point cloud and the corresponding RGB image
    3. BTF23 and M3DM23 rely on large memory banks of multimodal features. but extensive memory requirements and slow inference.
    4. AST23, follows a teacher-student paradigm conducive to a faster architecture.
      1. Yet, AST does not exploit the spatial structure of the 3D data but employs this information just as an additional input channel in a 2D network architecture.
      2. This results in inferior performance compared to M3DM and BTF
    5. the above is refer to figure 1 of CFM-cvpr24
  2. BTF23
    1. Inspired by PatchCore, BTF23 investigates the use of memory banks for 3D anomaly detection.
    2. The authors propose to add 3D features to the 2D features provided by a frozen convolutional model (Wide ResNet-50) to enhance anomaly detection performance.
    3. They test several 3D features and achieve the best results using hand-crafted descriptors extracted from Point Clouds [37].
  3. M3DM, cvpr23
    1. improved over BTF by employing rich and distinctive 2D and 3D features extracted by frozen Transformer-based foundation models trained by self-supervision on large datasets: DINO [8] and Point-MAE [26] to extract 2D and 3D features, respectively.
    2. The authors also propose a learned function to fuse 2D and 3D features into multimodal features stored in memory banks alongside those computed from the individual modalities.
  4. CFM-cvpr24
    1. Using the same feature extractors as M3DM, Yet, do not employ any memory bank and, instead, propose a novel crossmodal feature mapping paradigm realized by two lightweight neural networks.
    2. we achieve better performance on MVTec 3D-AD while requiring way less memory and running remarkably faster.

References

  1. [MVTec 3D-AD]: The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization, 22
  2. [Eyecandies]: The eyecandies dataset for unsupervised multimodal anomaly detection and localization, accv22
  3. Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network, cvpr24
  4. Real3D-AD: A Dataset of Point Cloud Anomaly Detection, nips23
  5. Image-Point cloud Fusion based Anomaly Detection using PD-REAL Dataset, 23
  6. CFM-cvpr24
  7. Shape-Guided Dual-Memory, ICML23 [memory bank]
  8. [BTF23]: BACK TO THE FEATURE:CLASSICAL 3D FEATURES ARE (ALMOST) ALL YOU NEED FOR 3D ANOMALY DETECTION, cvpr23w, [memory bank]
  9. M3DM: Multimodal Industrial Anomaly Detection via Hybrid Fusion, cvpr23 [memory bank]
    1. Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection, 24 [memory bank]
  10. DINO: Emerging properties in self-supervised vision transformers, iccv21
    1. (Self-DIstillation with NO Labels)
  11. PointMAE: Masked autoencoders for point cloud self-supervised learning, eccv22