1. https://fraunhoferhhi.github.io/RIPE/ has code
    1. RL is OK for the mission
    2. Innovative Weakly-Supervised Training Framework
      1. training set: 100% Positive/Negative pairs without pixel-wise correspondences
      2. no depth, no pose: some methods depend on known pose or depth information,
      3. no artificial augmentation: some methods create image pairs with known transformations, photometric and/or homographic

image.png

Keypoint detector & description

  1. 等分heatmap H into patches/cells ci, ci holds mxm logit values
    1. logit: "未归一化的原始分数" (描述性翻译,准确传达其技术含义)
  2. ci ⇒ just one potential keypoint position si with logit li and final probability pi

image.png

Keypoint description

Dhyper纬度高于作者用的vgg-19,所以1x1得到D。

Reinforcement of matchable keypoints

Common reinforcement learning

  1. policy is defined as a probability distribution over actions A, conditioned on the current state S and parameterized by θ

    $$ \pi_\theta(\mathcal{S}) = \mathbb{P}[\mathcal{A} \mid \mathcal{S}, \theta]. $$

    1. This constitutes a probability distribution, from which an action A is sampled.
    2. Based on the sampled action, the agent receives a reward signal that indicates a good or bad action.
  2. The learning objective is then formulated as maximizing the expected cumulative reward over a trajectory τ (a sequence of state, action, reward tuples) scaled by the reward R:

    $$ \max_{\theta} J(\theta) = \mathbb{E}{x \sim \pi{\theta}} \left[ R(\tau) \right] $$

    Agent? trajectory**?** reward R:? ⇒ later

  3. REINFORCE [45] provides an approximation for the derivative

    $$ \nabla_\theta J(\theta) \approx \hat{g} = \sum_{t=0}^T \nabla_\theta \log \pi_\theta(a_t \mid s_t) \, R(\tau). $$

To the CV problem

policy

  1. encoder-decoder network acts as a trainable policy, with the input image I representing the state, keypoint localization corresponds to an action

    $$ \pi_\theta(\mathbf{s}) = d_\theta(e_\theta(\mathbf{I})) = \mathbf{p}= \big[ \mathbb{P}_1[a_1 \mid I, \theta],\, \mathbb{P}_2[a_2 \mid I, \theta],\, \dots,\, \mathbb{P}_c[a_c \mid I, \theta] \big], $$

    where p is a list of distributions for each cell c in the heatmap H.

image.png

  1. A network/ agent generates probability distributions over potential keypoint locations.