Untitled

Contribution of the paper: fig 1.

  1. matching loss and comparing loss: min L, L = LMa + LCom;
    1. LMa: matching loss between image feature points, for large pose variation (a significant mismatch exists between rendered and query images), i.e. robust init.
      1. LoFTR(I, Q) ⇒ matched point pairs {mi, qi}, $L_{Ma}=\sum\|m_i-q_i\|^2$
      2. very important for init, see tab 5.
      3. LoFTR, cvpr21
  2. pixel-level comparing loss: LCom=MSE(I, Q) for small pose variation
    1. very important for final accurate estimation, see tab 6.
    2. Lcom可以改进,要克服色度变化,比如比较梯度。

Experiments

  1. fig 5: comparison with iNeRF with rotation range, translate range.
  2. tab 1, 2: half or 1/5 running time than iNeRF.
  3. tab 3, 4: better than matching-based methods: LightGlue [25], MatchFormer [37], and LoFTR [33]
    1. their input is 2 images:
    2. input of this this paper: query image + Gaussian representation
    3. The angular error within 1◦, 5◦, and 10◦, as well as that exceeding 20◦ (considered as an outlier)

References

  1. iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching, 23

  2. [2] Nerfels: renderable neural codes for improved camera pose estimation. cvpr22

    1. 复杂而无趣
    2. As opposed to iNeRF which overfit a model to the entire scene, we adopt a feature-driven approach for representing scene-agnostic local 3D patches with renderable codes.
    3. steps:
      1. Extract & Match Keypoints between 2 images: IL and IR.
      2. ⇒ 在大量scences上训练出Nerfels code with poses和对应的decoder (render),
      3. then 联合 Nerfels for matching
  3. [20] Nerfpose: A first-reconstruct-then-regress approach for weaklysupervised 6d object pose estimation, iccv23

    1. 复杂而无趣。
    2. Pose estimation of 3D objects in monocular images
    3. 以往需要3D object的3d 模型,本文用了3D object的Nerf表示。
  4. [41] iNeRF, IROS21

  5. LoFTR, cvpr21

  6. https://towardsdatascience.com/what-are-intrinsic-and-extrinsic-camera-parameters-in-computer-vision-7071b72fb8ec