Learning relationship-guided vision-language transformer for facial attribute recognition

Si Chen, Mingxuan Lei, Da-Han Wang, Xu-Yao Zhang, Yan Yan, Shunzhi Zhu

Published: 2026, Last Modified: 19 Sept 2025Pattern Recognit. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A novel relationship-guided vision-language Transformer is proposed for FAR.•An image-text cross-attention enhances the interaction between text and image tokens.•A token selection mechanism can reduce the image background interference.•An image-text alignment loss is designed for further modality alignment.•Experiments verify the superiority of our method, especially on limited labeled data.

External IDs:dblp:journals/pr/ChenLWZYZ26