Untitled

aggregated self- and cross-attention: reduce token size before each attention for efficiency
fusion network: fusing transformed coarse features with backbone features
A two-stage refinement is followed to obtain sub-pixel correspondence
Mutual-Nearest Neighbor

Method

Local feature extraction

an aggregated attention mechanism with adaptive token selection for efficiency.

Why Aggragated Attention Module?

VanillaAttention: applying it directly to dense local features is impractical due to the significant token size.
LinearAttention: reduced representational power

Untitled

depth-wise convolution

…