1 Introduction
1.1 Background
Unsupervised representation learning has proved to be a critical component of anomaly detection/localization in images. The challenges to learn such a representation are two-fold.
- Firstly, the sample size is not often large enough to learn a rich generalizable representation through conventional techniques.
- Secondly, while only normal samples are available at training, the learned features should be discriminative of normal and anomalous samples.
1.2 Why
- to tackle the above two issues.
- This is especially helpful when the sample size is small and the normal class shows significant variations.
- 都没见过anomaly data,为啥能好用?
- 认为imagenet中包含了anomaly
- cloner只学到了normal data for training
- there are some previous works on distillation + AD.

1.3 How
- detect and localize anomalies using the discrepancy between the expert and cloner networks’ intermediate activation values given an input sample.
- expert network: pre-trained VGG-16 on ImageNet
- cloner network: a simplified VGG-16?
- train distillation solely on the normal training data
- Multiresolution: distill the comprehensive knowledge
- 由于网络中每一层提取特征都不太一样,浅层更加关注边缘、纹理等细节,深层更加关注语义等信息,所以选用了中间多个层进行知识蒸馏
- discrepancy between intermediate activation values
- 对于异常定位,这里是通过寻找损失的梯度变化比较大的地方,因为在测试时,如果输入的是异常数据,那么教师和学生两个网络学到的特征就不一样,损失如果进行反向传播,梯度变化就会比较大
1.3.1 get bitterness from sweetness???
why distillation出了不一样的功能,get bitterness from sweetness????
forcing the cloner’s intermediate embedding of normal training data at several critical layers to conform to those of the source. Consequently,
- the cloner learns the normal data manifold thoroughly and
- yet earns no knowledge from the source about other possible input data.
- ??? Distilling the knowledge into a more compact network also helps to concentrate solely on the features that are distinguishing normal vs. anomalous.
- Hence, the cloner will behave differently from the source when fed with the anomalous data.
- Furthermore, a simpler cloner architecture enables avoiding distraction by
non-distinguishing features and enhances the discrepancy in the behavior of the two networks on anomalies