What does this paper do?

方法简单，paper简写，结果挺好，compared with the SOTA 1-shot AD method, PatchCore [20] cvpr22。

Few-shot AD (FSAD)
1. MAEDAY
  1. image-reconstruction based via MAE
  2. performs well by pre-training on ImageNet and only fine-tuning on a small set of normal images
  3. Embedding-based methods have demonstrated higher performance than MAEDAY .
2. the ensemble of the two approaches achieves very strong SOTA results.
Zero-Shot AD (ZSAD)
1. outperforms the SOTA one-shot results on proposed dataset.
Zero-Shot Foreign Object Detection (ZSFOD)
1. same method as ZSAD, just for a different mission.
a new dataset for ZSFOD

Related work

embedding-based: compare the embedding vectors of queries to a set of reference embeddings
image-reconstruction-based: this

Method

Untitled

Given a query image, repeat N=32 times
1. a random small subset of its patches (25%) ⇒ each patch is flattened into a single token + positional encoding ⇒ are fed to the MAE encoder.
2. output tokens of the encoder + ‘empty’ tokens ⇒ fed into MAE decoder ⇒ recovered patches, i.e. image
  1. ‘empty’ tokens are used to replace the masked-out tokens, see MAE.
  2. ‘empty’ tokens with just the positional encoding.
The anomaly scores are averaged across N reconstructions with different random masks.
1. measured by reconstruction errors, i.e. color differences.
FASD
1. use LoRA [11], a method originally introduced for finetuning large language models (transformers) without overfitting a small dataset.

15 datasets in MVTec-AD [1], the most popular and the main AD benchmark.