Untitled

Contribution of the paper: fig 1.

matching loss and comparing loss: min L, L = LMa + LCom;
1. LMa: matching loss between image feature points, for large pose variation (a significant mismatch exists between rendered and query images), i.e. robust init.
  1. LoFTR(I, Q) ⇒ matched point pairs {mi, qi}, $L_{Ma}=\sum\|m_i-q_i\|^2$
  2. very important for init, see tab 5.
  3. LoFTR, cvpr21
pixel-level comparing loss: LCom=MSE(I, Q) for small pose variation
1. very important for final accurate estimation, see tab 6.
2. Lcom可以改进，要克服色度变化，比如比较梯度。

Experiments

fig 5: comparison with iNeRF with rotation range, translate range.
tab 1, 2: half or 1/5 running time than iNeRF.
tab 3, 4: better than matching-based methods: LightGlue [25], MatchFormer [37], and LoFTR [33]
1. their input is 2 images:
2. input of this this paper: query image + Gaussian representation
3. The angular error within 1◦, 5◦, and 10◦, as well as that exceeding 20◦ (considered as an outlier)

References

iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching, 23
[2] Nerfels: renderable neural codes for improved camera pose estimation. cvpr22
1. 复杂而无趣
2. As opposed to iNeRF which overfit a model to the entire scene, we adopt a feature-driven approach for representing scene-agnostic local 3D patches with renderable codes.
3. steps:
  1. Extract & Match Keypoints between 2 images: IL and IR.
  2. ⇒ 在大量scences上训练出Nerfels code with poses和对应的decoder (render),
  3. then 联合 Nerfels for matching
[20] Nerfpose: A first-reconstruct-then-regress approach for weaklysupervised 6d object pose estimation, iccv23
1. 复杂而无趣。
2. Pose estimation of 3D objects in monocular images
3. 以往需要3D object的3d 模型，本文用了3D object的Nerf表示。
[41] iNeRF, IROS21
LoFTR, cvpr21
https://towardsdatascience.com/what-are-intrinsic-and-extrinsic-camera-parameters-in-computer-vision-7071b72fb8ec