Introduction

DifferNet 是首个将 归一化流（Normalizing Flows, NF） 引入工业图像异常检测（AD）的研究成果。

Untitled

NF通过一系列可逆的变换（Bijective mapping），将复杂的图像特征分布映射到一个**多元独立高斯分布（标准正态）**上
1. 将所有正常样本的特征转换到高斯分布中心；
2. 异常样本在经过同样的变换后，会明显偏离训练时形成的高斯分布。
3. 归一化流能够比简单的单峰高斯模型更好地捕捉Multimodal分布。
4. 纯粹的归一化流直接处理高分辨率图像时，会面临严重的“维度灾难”。DifferNet 的核心解法是：先用预训练的 CNN 提取图像的低维高级特征，然后再用归一化流网络去精确估计这些正常特征的概率密度（Likelihood）。
Why from image features not from images？
1. 直接处理全分辨率图像需要极大的计算开销
2. With image data, the problem arises that the network mainly focuses on local pixel correlation without taking semantics into account.
Image level judgement but can location via loss back propagation.
1. bijective mapping ⇒ improved location accuracy.
2. but the NF is between image feature y and latent z, the feature encoder f_ex() is not bijective, y=f_ex(x).

Method

Untitled

$y=f_{ex}(x)$
1. y∼未知分布
2. f_ex(): pre-trained feature extractor, such as VGG
$z=f_{NF}(y)$,
1. Z是一个隐空间（Latent Space）,这个空间中的数据z∈Z服从Standard Multivariate Gaussian distribution（即 z∼N(0,I) or pZ(z)=N(0,I)）。
2. f_NF: the module of Real-NVP [8]

loss: ****

根据概率论中的变量代换定理（Change of Variables）

$$ p_Y(y) = p_Z(z) \left| \det \frac{\partial z}{\partial y} \right| $$
max正常样本特征的似然度likelihood p_Y(y), 即正样本概率大
=》最小化负对数似然 (Negative Log-Likelihood, NLL), 即min -log(p_Y(y))

虽然 p_Y(y) 在数学形式上是一个概率密度函数，但在统计推断和模型训练的语境下，它被称为 likelihood（似然），原因如下：

概念	符号	解释角度	固定的是什么？
Probability	p(y∣θ)	给定参数 θ，数据 y 出现的概率	参数 θ
Likelihood	L(θ∣y)=p(y∣θ)	给定数据y，参数 θ的"合理性"	数据 y

The feature distributions of single or multiple categories
1. table 2，在当时是一个很大的提高：83.9=》94.9 detection AUROC on MVTec；
2. Multimodality: using all 15 categories of MVTec as training data: detection AUROC 90.2%
16 training images / 16 shots: 87.3
1. table 1, fig 7 值得看一下
2. 16 shots denotes a model trained on only 16 images
no quantity results for localization, just fig 8.
所以还是不少误检。
1. Normalized histogram of DifferNet’s anomaly scores for the test images of MTD.

Untitled