UnICLAM: Contrastive representation learning with adversarial masking for unified and interpretable Medical Vision Question Answering
Abstract: Highlights•We propose UnICLAM, a unified Medical-VQA framework with joint alignment and learning.•We introduce adversarial masking for data augmentation and improved cross-modal alignment.•Experimental results show our model outperforms others in prediction and interpretability.•We explore Medical-VQA’s role in heart failure diagnosis and its few-shot adaptation.
Loading