https://mhamilton.net/featup.html

Untitled

  1. Transform t: perturb the input image with small pads, scales, and horizontal flips and apply the model to each transformed image to extract a collection of low-resolution feature maps.

    1. These small image jitters allow us to observe tiny differences in the output features and provide sub-feature information to train the upsampler.
  2. F_{hr}: a latent high-resolution feature map from Joint bilateral Upsamling: Fhr = σ↑(f(x), x).

    1. x be an input image
    2. How to generate Fhr?
  3. Multi-view loss:

    Untitled

    1. s = N (f (t(x))) is a spatially-varying adaptive uncertainty (Hamilton et al., 2020) parameterized by a small linear network N.
    2. This extra flexibility allows the network to learn when certain outlier features fundamentally cannot be upsampled.
  4. TV los

    Untitled

    1. TV作用大,log(s)小:see Figure 9: Qualitative ablation study across both DINO and Resnet50 Backbones

Method

CHOOSING A DOWNSAMPLER

Untitled

  1. two options: a fast and simple learned blur kernel, and a more flexible attention-based downsampler

  2. blur-based downsampler is efficient, it cannot capture dynamic receptive fields, object salience, or other nonlinear effects

  3. spatially adapts the downsampling kernel

    1. uses a 1x1 convolution, Conv(Fhr[…]), to predict a saliency map from Fhr.

    Untitled

CHOOSING AN UPSAMPLER

  1. two variants: “JBU” (Kopf et al., 2007), or Implicit, see tab 1.

  2. Joint Bilateral Upsamplers (JBU)

    1. This feedforward upsampler is a parameterized generalization, MLP based, of a Joint Bilateral Upsampling (JBU) filter (Kopf et al., 2007)

      1. Kopf et al., 2007: is just a traditional filter.

      2. ⇒ GT Fhr^, which may be also used in “Implicit”

        Untitled

    2. each JBU is a two-layer GeLU (Hendrycks & Gimpel, 2016) MLP with 30-dimensional hidden and output vectors

  3. Implicit

    Untitled

    1. the component-wise discrete Fourier transform of an input signal z, with a vector of frequencies ωˆ.
    2. : represent concatenation
    3. 这里x没解释,可能是rgb or Fourier color features at position (ei, ej), fig 9 of sec 6.4

Untitled