https://arxiv.org/pdf/2211.14307.pdf

https://github.com/EliSchwartz/MAEDAY coming soon.

What does this paper do?

方法简单,paper简写,结果挺好,compared with the SOTA 1-shot AD method, PatchCore [20] cvpr22。

  1. Few-shot AD (FSAD)
    1. MAEDAY
      1. image-reconstruction based via MAE
      2. performs well by pre-training on ImageNet and only fine-tuning on a small set of normal images
      3. Embedding-based methods have demonstrated higher performance than MAEDAY .
    2. the ensemble of the two approaches achieves very strong SOTA results.
  2. Zero-Shot AD (ZSAD)
    1. outperforms the SOTA one-shot results on proposed dataset.
  3. Zero-Shot Foreign Object Detection (ZSFOD)
    1. same method as ZSAD, just for a different mission.
  4. a new dataset for ZSFOD

Related work

  1. embedding-based: compare the embedding vectors of queries to a set of reference embeddings
  2. image-reconstruction-based: this

Method

Untitled

  1. Given a query image, repeat N=32 times
    1. a random small subset of its patches (25%) ⇒ each patch is flattened into a single token + positional encoding ⇒ are fed to the MAE encoder.
    2. output tokens of the encoder + ‘empty’ tokens ⇒ fed into MAE decoder ⇒ recovered patches, i.e. image
      1. ‘empty’ tokens are used to replace the masked-out tokens, see MAE.
      2. ‘empty’ tokens with just the positional encoding.
  2. The anomaly scores are averaged across N reconstructions with different random masks.
    1. measured by reconstruction errors, i.e. color differences.
  3. FASD
    1. use LoRA [11], a method originally introduced for finetuning large language models (transformers) without overfitting a small dataset.

Experiments

15 datasets in MVTec-AD [1], the most popular and the main AD benchmark.