Multimodal Integration Based on Weak Alignment for Rectal Tumor Grading

Published: 2025, Last Modified: 22 Jan 2026ICIC (28) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Accurate rectal cancer grading requires complementary CT images and pathological text. However, applying vision-language models faces challenges: limited disease category diversity and image-text mismatches hinder large-scale, one-to-one pair construction for contrastive learning. Additionally, complex CT anatomy with irrelevant tissues interferes with tumor identification, a problem general-purpose models struggle with, necessitating a specialized rectal-focused approach. This paper proposes a rectal tumor grading method based on weak alignment of image and pathological text features. We leverage pathological text to guide image features toward tumor category prototypes, avoiding strict one-to-one pairing. Large language models refine medical text, while segmentation models generate foreground channels to isolate the rectal region, improving feature extraction. A cross-modal attention mechanism further enhances alignment, leading to more accurate tumor grading, with experimental results showing a 2.3% accuracy improvement over existing methods.
Loading