https://zju3dv.github.io/efficientloftr/


- aggregated self- and cross-attention: reduce token size before each attention for efficiency
- fusion network: fusing transformed coarse features with backbone features
- A two-stage refinement is followed to obtain sub-pixel correspondence
- Mutual-Nearest Neighbor
Method
Local feature extraction
- adopt RepVGG [11] as our feature backbone
Efficient Local Feature Transformation
an aggregated attention mechanism with adaptive token selection for efficiency.
Why Aggragated Attention Module?
- VanillaAttention: applying it directly to dense local features is impractical due to the significant token size.
- LinearAttention: reduced representational power


depth-wise convolution
…
Coarse-level Matching Module