Abstract: Highlights•Propose a novel cross-modal model (C2M-DoT) to better generate medical reports.•Propose a multi-view contrastive learning strategy to utilize multi-view information.•Propose a domain transfer network to get good performance using single-view inputs.•Propose a cross-modal optimization (CMC) loss to better learn visual semantics.•Extensive experiments prove the effectiveness of C2M-DoT upon the existing baselines.
External IDs:dblp:journals/inffus/WangXWLL26
Loading