Abstract: Highlights•A multi-label classification of cell cluster for thyroid FNAB-WSIs is proposed.•An easy-category mask processing is adopted to balance the difficulty of multiple labels.•A weight downsampling improved CvT encoder is used to exact spatial features of cell clusters.•A multi-layer Transformer decoder with spatial features and labels is designed.