看一下上面链接中的有挑战的结果，反映了：seek to output highly accurate and dense matches that are robust to viewpoint and illumination changes
far better than: DUSt3R: Geometric 3D Vision Made Easy, cvpr24, from the same lab
1. DUSt3R 是一个颠覆性的“无先验 3D 重建”基础模型，
  1. 将所有 3D 视觉任务（双目立体视觉、多视图重建、姿态估计）统一为一个**“3D点图回归（Point-map Regression）”**任务。
  2. 最大贡献：inference阶段完全摆脱了对相机内参（Intrinsics）和外参（Extrinsics）的依赖，输入两张毫无关联的图像，直接吐出它们在同一个坐标系下的 3D 点云。
2. 而 MASt3R 则是站在 DUSt3R 的肩膀上，专门为了解决“高精度、高效率 2D 图像匹配”而诞生的进化版。
  1. 30% absolute improvement compared to the second best published method, LoFTR+KBR

Method

each input pixel ⇒ a 3D point, a confidence value and a local feature
DUSt3R was never explicitly trained for matching, i.e. only has confidence-aware regression loss eq 4)
1. MAST3R is trained with GT 3d point correspondences, which is also used to constrain local feature matching by the projected pixels of those GT: eq 7, 8)
Fast NN (3.3 Fast reciprocal 互惠的matching)
1. mutual nearest neighbors: O(W^2H^2)
  
  $$ \mathcal{M} = \{(i,j) \mid j = \mathrm{NN}_2(D_i^1) \ \text{and} \ i = \mathrm{NN}_1(D_j^2)\}, \tag{13} $$
  
  $$ \text{with } \mathrm{NN}_A(D_j^B) = \arg\min_i \left\| D_i^A - D_j^B \right\|. \tag{14} $$
  1. NN vi K-d trees typically very inefficient in high dimensional space
2. fast matching by subsampling ⇒ 𝑂(𝑘𝑊 𝐻)
  1. 就是利用image is a regular grid + 需要计算所有pixel的MNN =》启发式算法
- 3.4 Coarse-to-fine matching 不值得讲

Discussion

Comparison with RoMa
1. 没评价RoMa。table 2: on the test set of the Map-free dataset。比RoMa好？说明什么？judge them on background!!??
  1. Map-free Visual Relocalization: Metric Pose Relative to a Single Image, eccv22 室外场景，前景较大。
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors, cvpr25
1. two-view 3D reconstruction priors, pioneered by DUSt3R [49] and its successor MASt3R [20]