same idea, same architecture, different networks

expert / teacher network: pre-trained VGG-16 vs Resnet18 on ImageNet
cloner / student network: a simplified VGG-16 vs the same Resnet18, both of them are randomly initialized.
both a multiresolution scheme, i.e. transfer knowledge from intermediate layers
same loss: cosine lose between corresponding layers.
1. [1]是逐像素的, [2]的cosine是对整个layer的，所以定位的时候要用gradient

different network location mechanism

[1] is simpler and better.

[2] via gradient of loss function

AUC-ROC / AR for anomaly detection on MVTecAD: 0.955 vs 0.8774

AUC-ROC /AR for anomaly localization on MVTecAD: 0.970 vs 0.9071

[1] Student-Teacher Feature Pyramid Matching for Anomaly Detection, BMVC21

https://github.com/hcw-00/STPM_anomaly_detection
1. Unofficial pytorch implementation, 和论文效果一致。
张浚然
1. anomaly detection和location的threshold如何确定。
fig 4 说明pre-trained on imagenet比用其他小数据集好很多.