Learning relationship-guided vision-language transformer for facial attribute recognition

Published: 01 Jan 2026, Last Modified: 19 Sept 2025Pattern Recognit. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A novel relationship-guided vision-language Transformer is proposed for FAR.•An image-text cross-attention enhances the interaction between text and image tokens.•A token selection mechanism can reduce the image background interference.•An image-text alignment loss is designed for further modality alignment.•Experiments verify the superiority of our method, especially on limited labeled data.
Loading