Abstract: Highlights•Aiming to “how to divide patch?”, the 5 key techniques of patch division mechanism are summarized: from single-size division to multi-size division, from fixed number division to adaptive number division, from non-overlapping division to overlapping division, from semantic segmentation division to semantic aggregation division, and from original image division to feature map division.•Aiming to “how to select token?”, the 3 key techniques of token selection mechanism are summarized: token selection based on score, token selection based on merge, token selection based on convolution and pooling.•Aiming to “how to add position encoding?”, the 5 key techniques of position encoding mechanism are summarized: absolute position encoding, relative position encoding, conditional position encoding, locally-enhanced position encoding, and zero-padding position encoding.•Aiming to “how to calculate attention?”, 18 attention mechanisms are summarized based on the timeline.
Loading