1 What

Untitled

Students learn to handle normal images from pretrained teachers.
1. As the teacher network has been pre-trained on a large dataset, it can generate discriminative feature representations in both normal and anomalous regions.
2. the student network is usually trained with normal samples, i.e. gets teacher’s ability for normal regions.
3. an exception? DeSTSeg, iccv23

2 Related work

[10], cvpr20, are the first to use teacher-student architecture for anomaly detection.
STPM, BMVC21 and MKD, cvpr21 both use multi-scale features under different network layers for distillation, they do so in different ways.
1. MKD, cvpr21 uses multi-scale features and lighter networks for distillation.
2. STPM, BMVC21 uses multi-scale features under different network layers for distillation.
Based on STPM, RSTPM[13, 17] adds a pair of teacher-student networks.
(IKD)[15] 2022 引入了一些新技术，但是发表的不好。
1. context similarity loss (CSL) and adaptive hard sample mining (AHSM) modules,

Is distillation enough?

Untitled

(a) Distillation. (b) Reverse Distillation.