
扩散模型(diffusion models)是一类生成模型(generative model),其核心思想是:
这种构造方式使扩散模型在图像生成方面拥有高质量、模式覆盖好等优点。
is fixed to a Markov chain that gradually adds Gaussian noise to the data according to a variance schedule $\beta_1, ..., \beta_T$
$$ q(\mathbf{x}_{1:T} \mid \mathbf{x}0) \coloneqq \prod{t=1}^{T} q(\mathbf{x}t \mid \mathbf{x}{t-1}),\quad q(\mathbf{x}t \mid \mathbf{x}{t-1}) \coloneqq \mathcal{N}\bigl(\mathbf{x}t;\, \sqrt{1 - \beta_t}\, \mathbf{x}{t-1},\, \beta_t \mathbf{I}\bigr), $$
where $q(\mathbf{x}_{1:T} \mid \mathbf{x}_0)$ 是从x0出发生成整条噪声轨迹的联合分布, i.e. q(trajectory)
$q(x_T \mid x_0)$: 边缘分布(marginal), 是把中间所有变量积分掉得到的:
$$ q(x_T \mid x_0) = \int q(x_{1:T} \mid x_0)dx_{1:T-1}, $$
example:
$$ q(x_2 \mid x_0) = \int q(x_{1:2} \mid x_0)dx_{1}, $$
…
$q(\mathbf{x}_t \mid \mathbf{x}_0)$ is used in Training for generating xt from x0, i.e. sampling from the following distribution.
$$ q(\mathbf{x}_t \mid \mathbf{x}_0) = \mathcal{N}\bigl(\mathbf{x}_t;\, \sqrt{\bar{\alpha}_t}\, \mathbf{x}_0,\, (1 - \bar{\alpha}_t)\mathbf{I}\bigr), $$
where
$$ \bar{\alpha}t = \prod{s=1}^{t} \alpha_s, \quad \alpha_s = 1 - \beta_s $$
a learnable Markov chain
$$ p_\theta(\mathbf{x}{0:T}) \coloneqq p(\mathbf{x}T) \prod{t=1}^{T} p\theta(\mathbf{x}{t-1} \mid \mathbf{x}t),\quad p\theta(\mathbf{x}{t-1} \mid \mathbf{x}t) \coloneqq \mathcal{N}\bigl(\mathbf{x}{t-1};\, \boldsymbol{\mu}_\theta(\mathbf{x}t, t),\, \boldsymbol{\Sigma}\theta(\mathbf{x}_t, t)\bigr), $$
where set variance to untrained time dependent constants: $\boldsymbol{\Sigma}_\theta(\mathbf{x}_t, t)=\sigma_t^2I$, σt2也叫noise schedule,so
$$ p_\theta(x_{t-1} \mid x_t)=\mathcal{N}\left(\mu_\theta(x_t,t),\sigma_t^2 I\right), $$
$p_\theta(x_{t-1} \mid x_t)$和 $p_\theta(x_{t-2} \mid x_{t-1})$用的是同一个network $\mu_\theta(\cdot,\cdot)$。
To synthesize new data instances x0
为什么可以不学方差?
在 diffusion 里:
因此早期方法选择固定它。后续工作(例如 Improved DDPM)允许:
$$ p_\theta(x_{t-1} \mid x_t)=\mathcal{N}\left(\mu_\theta(x_t,t),\sigma_\theta (x_t,t) I\right) $$
| 情况 | (\Sigma) 是否依赖 学习(\theta) |
|---|---|
| 原始 DDPM | ❌ 不依赖 |
| Improved DDPM | ✅ 可依赖 |
| DDIM (deterministic) | 实际上可以无随机性 |
✨ Denoising Diffusion Probabilistic Models (DDPM) 学习一个Probability Density Function **$p_{\theta}(x_{t-1}|x_{t})$,**从而逐步降燥到x0。
