https://mhamilton.net/featup.html

Transform t: perturb the input image with small pads, scales, and horizontal flips and apply the model to each transformed image to extract a collection of low-resolution feature maps.
F_{hr}: a latent high-resolution feature map from Joint bilateral Upsamling: Fhr = σ↑(f(x), x).
Multi-view loss:

TV los


two options: a fast and simple learned blur kernel, and a more flexible attention-based downsampler
blur-based downsampler is efficient, it cannot capture dynamic receptive fields, object salience, or other nonlinear effects
spatially adapts the downsampling kernel

two variants: “JBU” (Kopf et al., 2007), or Implicit, see tab 1.
Joint Bilateral Upsamplers (JBU)
This feedforward upsampler is a parameterized generalization, MLP based, of a Joint Bilateral Upsampling (JBU) filter (Kopf et al., 2007)
Kopf et al., 2007: is just a traditional filter.
⇒ GT Fhr^, which may be also used in “Implicit”

each JBU is a two-layer GeLU (Hendrycks & Gimpel, 2016) MLP with 30-dimensional hidden and output vectors
Implicit

