code: https://github.com/aritra0593/Reinforced-Feature-Points
1Intro
1.1 Why
- Training detector networks usually resorts to optimizing low-level matching scores, often pre-defining sets of image patches which should or should not match,
- Unfortunately, increased accuracy for these low-level matching scores does
not necessarily translate to better performance in high-level vision tasks.
- ⇒ a new training methodology which training the feature detector in a “higher level” task.
- the task is relative pose estimation between a pair of images.
- ⇒ better performance for high-level tasks, such as pose estimation.
- actually a fine-tune of pretrained SuperPoint [14], which supervised by low-level matching scores.
- it seems that the network can not be trained from random initialization.
2 Method
2.1 Relative pose estimation
- find the essential matrix, E, which maximises the inlier count among all correspondences.
- (ql, qr) is an inlier if qr is close to the epipolar line, defined by Eql, below a threshold
- qr’Eql=0
- 9 Essential & fundamental matrices
- a robust estimator like RANSAC [17] with a 5-point solver [33] ⇒ Essential matrix
- Essential matrix decomposition
- E=R[T]x ⇒ transformation T^ in the paper, which is rotation R and translation T.

- from 1) and 2) ⇒ A set of M matches M = {mij} between I and I′ defined by independent samples X and X’.
- a match, mij = (xi, x′j), between two key points xi and x’j
- We treat the vision task as a (potentially non-differentiable) black box ⇒ T^.
- Supervised by GT camera transformation T*.
- The black box provides an error signal l(M, X, X’)=fun(T^, T*), used to reinforce the key point and matching probabilities.
- but the gradient of l(M, X, X’) is not needed.
2.2 reinforcement learning
2.2.1 Why
- we cannot directly propagate gradients of our estimated transformation Tˆ back
to update the network weights, as in standard supervised learning.
- Components of our vision pipeline, like the robust estimator (e.g. RANSAC [17]) or the minimal solver (e.g. the 5-point solver [33]) might also be non-differentiable.
- To optimize the neural network parameters for our task, we apply principles from reinforcement learning [48].
2.2.2 How