Geometry-Aware Depth-Guided Explainable Multimodal Polyp Size Estimation: A Fusion Model Beyond RGB

Krispian Lawrence; Usha Goparaju; Luís C. Lamb

Geometry-Aware Depth-Guided Explainable Multimodal Polyp Size Estimation: A Fusion Model Beyond RGB

Krispian Lawrence, Usha Goparaju, Luís C. Lamb

03 Dec 2025 (modified: 15 Dec 2025)MIDL 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: polyp size estimation, depth fusion, geometry-awareness, multi-modal learning, endoscopy

TL;DR: We introduce a polyp size estimation framework that fuses RGB texture, segmentation-derived geometry, and monocular depth via a PISE cross-attention block, resolving size–distance ambiguity and significantly improving accuracy.

Abstract: Accurately estimating the physical size of colorectal polyps from monocular endoscopy is difficult due to scale ambiguity, viewpoint distortions, and strong inter-patient variability. We introduce MPSE, a geometry-aware, depth-guided multimodal framework that jointly leverages RGB appearance, monocular depth cues, and interpretable geometry descrip- tors to produce reliable and clinically calibrated size estimates. Central to MPSE is a geometry-as-query fusion block that selectively attends to depth and RGB features, and a Scale Consistency Block (SCB) that models agreement between 2D footprint–derived and 3D depth–derived cues, reducing size bias under severe distribution imbalance. The model is trained with a primary regression objective supported by an auxiliary threshold-based classification loss that stabilizes predictions near clinically important cutoffs. On our clini- cal dataset, MPSE achieves a mean absolute error of 0.93 mm and a polyp-level F1 score of 0.87 at the clinically critical 5 mm threshold, demonstrating accurate and clinically reliable size estimation in endoscopy.

Primary Subject Area: Application: Endoscopy

Secondary Subject Area: Safe and Trustworthy Learning-assisted Solutions for Medical Imaging

Registration Requirement: Yes

Visa & Travel: Yes

Read CFP & Author Instructions: Yes

Originality Policy: Yes

Single-blind & Not Under Review Elsewhere: Yes

LLM Policy: Yes

Submission Number: 298

Loading