training on the normal data, AE is expected to produce higher reconstruction error for the abnormal inputs than the normal ones ⇒ Anomaly detection.
The assumption does not always hold in practice.

⇒ Memory-augmented AE

training only with normal samples

Memory sizes N for MNIST and CIFAR-10 are set as 100 and 500, respectively.
video dataset UCSD-Ped2: N=500 is not enough, N=1000 to 3000 ⇒ same accurancy.

w is defined by the similarity between z and mi.
a complex combination of the memory items via a dense w may ⇒ well reconstructed anomaly, so
a hard shrinkage operation to promote the sparsity of w:

λ as a value in the interval [1=N; 3=N]