BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation

huanyang tong; Kai Liu; fangjun kuang; Huiling Chen

BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation

huanyang tong, Kai Liu, fangjun kuang, Huiling Chen

Published: 26 Apr 2026, Last Modified: 26 Apr 2026Med-Reasoner 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: Vision-Language Models, Prompt Learning, Parameter-Efficient Fine-Tuning, Few-shot Learning

TL;DR: BiomedAP makes medical VLM adaptation more robust to noisy and heterogeneous clinical prompts through intermediate cross-modal fusion and dual-anchor semantic regularization.

Abstract: Biomedical Vision--Language Models (VLMs) have shown remarkable promise in few-shot medical diagnosis but face a critical bottleneck: fragility to prompt variations. Existing adaptation frameworks typically optimize visual and textual prompts as independent streams, relying on ideal "Golden Prompts". In clinical reality, where descriptions are often noisy and heterogeneous, this modality isolation leads to unstable cross-modal alignment. To address this, we propose BiomedAP, a vision-informed dual-anchor framework with gated cross-modal fusion. BiomedAP enforces synergistic alignment through two mechanisms: (1) Gated Cross-Modal Fusion, which enables layer-wise interaction between modalities, acting as a dynamic noise regulator to suppress irrelevant textual cues; and (2) a Dual-Anchor Constraint that regularizes learnable prompts toward stable semantic centroids derived from both expert templates (High Anchors) and few-shot visual prototypes (Low Anchors). Extensive experiments across 11 benchmarks demonstrate that BiomedAP consistently surpasses baselines, achieving competitive few-shot accuracy and markedly enhanced robustness under prompt perturbations. Our code is available at: https://github.com/ tongdiedie/BiomedAP. Keywords: Vision-Language Models; Prompt Learning; Parameter-Efficient Fine-Tuning; Few-shot Learning

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 29

Loading