1 What

- Students learn to handle normal images from pretrained teachers.
- As the teacher network has been pre-trained on a large dataset, it can generate discriminative feature representations in both normal and anomalous regions.
- the student network is usually trained with normal samples, i.e. gets teacher’s ability for normal regions.
- an exception? DeSTSeg, iccv23
2 Related work
- [10], cvpr20, are the first to use teacher-student architecture for anomaly detection.
- STPM, BMVC21 and MKD, cvpr21 both use multi-scale features under different network layers for distillation, they do so in different ways.
- MKD, cvpr21 uses multi-scale features and lighter networks for distillation.
- STPM, BMVC21 uses multi-scale features under different network layers for distillation.
- Based on STPM, RSTPM[13, 17] adds a pair of teacher-student networks.
- (IKD)[15] 2022 引入了一些新技术,但是发表的不好。
- context similarity loss (CSL) and adaptive hard sample mining (AHSM) modules,
3 new paradigm: reverse distillation
Is distillation enough?

(a) Distillation. (b) Reverse Distillation.
Inverse distillation
Reverse Distillation, cvpr22
DeSTSeg, iccv23
RD++, cvpr23
4 Multimodal
AST, wacv23
References