Abstract: Highlights•Spatial constraint contributes to concentrated attention and performance gain in MSA.•Concentrated attention helps ViTs facilitate optimization at data-scarce cases.•Hierarchical constraint yields progressive attention across different layers.•Layer-wise reasoning facilitates the understanding of inner workings in ViTs.
Loading