Abstract: Highlights•We design an anchor-based ViT, which generates attention using anchors and tokens.•To differentiably learn the pivotal regions, the anchor are represented by neurons.•Inspired by the Markov process, the global attention can be computed by anchors.•By rearranging the multiplication orders, it only requires the linear complexity.•AnchorFormer increases 9.0% accuracy and reduces 46.7% FLOPs on classification.
External IDs:doi:10.1016/j.patrec.2025.07.016
Loading