Future work
how to handle misalignment?
每个Bank多大?速度慢吧?
Abstract
- unsupervised multimodal AD
- but large memory and slow inference
- rely on large memory banks of multimodal features
- Hybird fusion: feature fusion + decision fusion
- classifiers: One-Class SVM

- Point Feature Alignment (PFA)
- Fpt is a pretrained Point Transformer from PointMAE [21]
- projection: by camera parameters
- to better align the point cloud and RGB features 没体现呀?
- Frgb is a Vision Transformer from DINO [6] to extract color info
- Unsupervised Feature Fusion (UFF)
- with patch-wise contrastive learning
- has loss only for positive pairs, seems reasonable.
- χrgb, χpt are MLP layers and σr , σp are single fully connected layers;
- 𝑫𝒆𝒄𝒊𝒔𝒊𝒐𝒏 𝑳𝒂𝒚𝒆𝒓 𝑭𝒖𝒔𝒊𝒐𝒏
- ϕ, ψ are score function for single memory bank detection and segmentation
- introduced by PatchCore [26]
- P is the memory bank building algorithm: PatchCore [26]
- Da, Ds: two learnable One-Class Support Vector Machines (OCSVM) [29] for anomaly detection and segmentation;
- 所以不需要synthetic data?
- two-stage training procedure
- construct memory banks
- train the decision layer.
Method
Unsupervised Feature Fusion (UFF)

encourage the features from different modalities at the same position to have more corresponding information, while the features at different positions have less corresponding information.
这没有抗disalignment的效果呀?做个实验试试