6周1.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, iclr21

Do Vision Transformers See Like Convolutional Neural Networks, nips21
fig 3, 4 of DINOv3: https://arxiv.org/pdf/2508.10104
https://paperswithcode.com/sota/feature-upsampling-on-imagenet?p=featup-a-model-agnostic-framework-for
Look the video of FeatUp, ICLR24