utilizes the Vision Transformer (ViT) to improve localization performance
realistic dataset creation framework that only collects clean runway images
1. video from UAS, 30 feet high.
2. just so so.
The training and testing data for this method are collected at a local airport using unmanned aircraft systems (UAS)
1. 81, 185 images for training efficiently 好多跑道？好多condition? 没说。
  1. 3840×2160 resolution images (0.1 inch/pixel) were resized to the size can be split into 8 by 4 grid of 448×448 patches.
  2. FOD-A 也是400*400的patch，但是object比较大？
2. the testing data results in 447 testing patches

FOD location

reconstruction-based, 就是AD呀。

insert ViT layer [6] into autoencoder network.

FOD Classification

RELATED WORK

collects all clean runway images of an airport and stores them in an image database, then samples a new runway image at detection time, queries the image database for the corresponding image using GPS coordinates, aligns the two images, and then subtracts the two images to check for differences [13].

may not be robust to subtle changes in the airport environment

Experiments

tabl III

References

Foreign Object Debris Detection for Airport Pavement Images based on Self-supervised Localization and Vision Transformer, CSCI 22
[6] An image is worth 16x16 words: Transformers for image recognition at scale, 21
[13] Feasibility assessment of suas-based automated fod detection system, ICCR18