https://github.com/naver/mast3r

  1. 看一下上面链接中的有挑战的结果,反映了:seek to output highly accurate and dense matches that are robust to viewpoint and illumination changes
  2. far better than: DUSt3R: Geometric 3D Vision Made Easy, cvpr24, from the same lab
    1. DUSt3R 是一个颠覆性的“无先验 3D 重建”基础模型,

      1. 将所有 3D 视觉任务(双目立体视觉、多视图重建、姿态估计)统一为一个**“3D点图回归(Point-map Regression)”**任务。

      image.png

      1. 最大贡献:inference阶段完全摆脱了对相机内参(Intrinsics)和外参(Extrinsics)的依赖,输入两张毫无关联的图像,直接吐出它们在同一个坐标系下的 3D 点云。
    2. 而 MASt3R 则是站在 DUSt3R 的肩膀上,专门为了解决“高精度、高效率 2D 图像匹配”而诞生的进化版。

      1. 30% absolute improvement compared to the second best published method, LoFTR+KBR

Method

DUSt3R

image.png

  1. a pointmap X ∈ R^{W×H×3}

MASt3R

image.png

  1. each input pixel ⇒ a 3D point, a confidence value and a local feature
  2. DUSt3R was never explicitly trained for matching, i.e. only has confidence-aware regression loss eq 4)
    1. MAST3R is trained with GT 3d point correspondences, which is also used to constrain local feature matching by the projected pixels of those GT: eq 7, 8)
  3. Fast NN (3.3 Fast reciprocal 互惠的matching)
    1. mutual nearest neighbors: O(W^2H^2)

      $$ \mathcal{M} = \{(i,j) \mid j = \mathrm{NN}_2(D_i^1) \ \text{and} \ i = \mathrm{NN}_1(D_j^2)\}, \tag{13} $$

      $$ \text{with } \mathrm{NN}_A(D_j^B) = \arg\min_i \left\| D_i^A - D_j^B \right\|. \tag{14} $$

      1. NN vi K-d trees typically very inefficient in high dimensional space
    2. fast matching by subsampling ⇒ 𝑂(𝑘𝑊 𝐻)

      1. 就是利用image is a regular grid + 需要计算所有pixel的MNN =》启发式算法

Discussion

  1. Comparison with RoMa
    1. 没评价RoMa。table 2: on the test set of the Map-free dataset。比RoMa好?说明什么?judge them on background!!??
      1. Map-free Visual Relocalization: Metric Pose Relative to a Single Image, eccv22 室外场景,前景较大。
  2. MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors, cvpr25
    1. two-view 3D reconstruction priors, pioneered by DUSt3R [49] and its successor MASt3R [20]

References