https://arxiv.org/pdf/2211.14307.pdf
https://github.com/EliSchwartz/MAEDAY coming soon.
What does this paper do?
方法简单,paper简写,结果挺好,compared with the SOTA 1-shot AD method, PatchCore [20] cvpr22。
- Few-shot AD (FSAD)
- MAEDAY
- image-reconstruction based via MAE
- performs well by pre-training on ImageNet and only fine-tuning on a small set of normal images
- Embedding-based methods have demonstrated higher performance than MAEDAY .
- the ensemble of the two approaches achieves very strong SOTA results.
- Zero-Shot AD (ZSAD)
- outperforms the SOTA one-shot results on proposed dataset.
- Zero-Shot Foreign Object Detection (ZSFOD)
- same method as ZSAD, just for a different mission.
- a new dataset for ZSFOD
Related work
- embedding-based: compare the embedding vectors of queries to a set of reference embeddings
- image-reconstruction-based: this
Method

- Given a query image, repeat N=32 times
- a random small subset of its patches (25%) ⇒ each patch is flattened into a single token + positional encoding ⇒ are fed to the MAE encoder.
- output tokens of the encoder + ‘empty’ tokens ⇒ fed into MAE decoder ⇒ recovered patches, i.e. image
- ‘empty’ tokens are used to replace the masked-out tokens, see MAE.
- ‘empty’ tokens with just the positional encoding.
- The anomaly scores are averaged across N reconstructions with different random masks.
- measured by reconstruction errors, i.e. color differences.
- FASD
- use LoRA [11], a method originally introduced for finetuning large language models (transformers) without overfitting a small dataset.
Experiments
15 datasets in MVTec-AD [1], the most popular and the main AD benchmark.