1 What

Untitled

  1. Students learn to handle normal images from pretrained teachers.
    1. As the teacher network has been pre-trained on a large dataset, it can generate discriminative feature representations in both normal and anomalous regions.
    2. the student network is usually trained with normal samples, i.e. gets teacher’s ability for normal regions.
    3. an exception? DeSTSeg, iccv23

2 Related work

  1. [10], cvpr20, are the first to use teacher-student architecture for anomaly detection.
  2. STPM, BMVC21 and MKD, cvpr21 both use multi-scale features under different network layers for distillation, they do so in different ways.
    1. MKD, cvpr21 uses multi-scale features and lighter networks for distillation.
    2. STPM, BMVC21 uses multi-scale features under different network layers for distillation.
  3. Based on STPM, RSTPM[13, 17] adds a pair of teacher-student networks.
  4. (IKD)[15] 2022 引入了一些新技术,但是发表的不好。
    1. context similarity loss (CSL) and adaptive hard sample mining (AHSM) modules,

3 new paradigm: reverse distillation

Is distillation enough?

Untitled

(a) Distillation. (b) Reverse Distillation.

Inverse distillation

Reverse Distillation, cvpr22

DeSTSeg, iccv23

RD++, cvpr23

4 Multimodal

AST, wacv23

References