https://zju3dv.github.io/efficientloftr/

Untitled

Untitled

  1. aggregated self- and cross-attention: reduce token size before each attention for efficiency
  2. fusion network: fusing transformed coarse features with backbone features
  3. A two-stage refinement is followed to obtain sub-pixel correspondence
  4. Mutual-Nearest Neighbor

Method

Local feature extraction

Efficient Local Feature Transformation

an aggregated attention mechanism with adaptive token selection for efficiency.

Why Aggragated Attention Module?

  1. VanillaAttention: applying it directly to dense local features is impractical due to the significant token size.
  2. LinearAttention: reduced representational power

Untitled

Untitled

depth-wise convolution

Coarse-level Matching Module