Introduction

条件归一化流（Conditional Normalizing Flows）=》极高的运行速度和较低的内存占用，能够满足工业上的实时检测需求。

Method

Untitled

patch-based

Why is it proposed?

AD task can be reformulated as a task of out-of-distribution detection (OOD)

OOD for low-dimensional data: clustering is enough [10], but less trivial for high-resolution images.

A simple two-dimensional example. It illustrates global anomalies (x1, x2), a local anomaly x3 and a micro-cluster c3.
1. low-dimensional data is from low-dimensional industrial sensors (e.g. power-line or acoustic)
  1. 传统的工业传感器（比如测电线电压电流的传感器，或者测机器震动的声学传感器），它们收集到的数据通常是几个简单的数值或一维的时间序列。这些数据的“维度”很低。
  2. [10]只讨论“几个简单的数值”，不讨论time series
OOD for high-dimensional data: images
1. convolutional neural networks (CNNs) can encode images into low-dimensional feature maps
2. how to process feature maps in high speed & low memory?
memory bank + knn is large and slow, such as Spade, 20 ;
1. SPADE allocates memory for a train gallery G used in k-nearest-neighbors.
distribution + Mahalanobis distance is still far from real-time processing in the state-of-the-art unsupervised AD methods: PaDim, 21 .
1. PaDiM keeps large matrices (Σki )−1, i ∈ {Hk ×Wk } for Mahalanobis distance.
2. The normalizing flow framework can estimate the exact likelihoods of any arbitrary distribution with pZ density, while Mahalanobis distance is limited to MVG distribution, multivariate Gaussian, only. So transform them into MVG by normalizing flow first.
  1. For example, CNNs trained with L1 regularization would have Laplace prior [11] or have no particular prior in the absence of regularization.
Our model employs trained decoders gk (θk ) for post-processing.

Untitled

All models use the same encoder h(λ), but diverge in the post-processing.

Untitled

STC a dataset with 256*256 images; the image size in MVTec dataset is class dependent.

为什么inference speed差这么多？

compute patch anomaly score via g_k() directly，see DifferNet in the References.

change the Normalizing Flows, Real-NVP [9], used in DifferNet to conditional normalizing flows [2]
1. patch position a conditional input
extend DifferNet to patch-level ad.
1. more computational and memory-efficient architecture?
2. the conditional extension does not increase model size since Ck << Dk.
  1. dimension of position embedding: Ck=128
  2. dimension of image features: Dk.