Geometry-Aware Depth-Guided Explainable Multimodal Polyp Size Estimation: A Fusion Model Beyond RGB

Published: 14 Feb 2026, Last Modified: 16 Apr 2026MIDL 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: polyp size estimation, depth fusion, geometry-awareness, multi-modal learning, endoscopy
TL;DR: We introduce a polyp size estimation framework that fuses RGB texture, segmentation-derived geometry, and monocular depth via a PISE cross-attention block, resolving size–distance ambiguity and significantly improving accuracy.
Abstract: Accurately estimating the physical size of colorectal polyps from monocular endoscopy is difficult due to scale ambiguity, viewpoint distortions, and strong inter-patient variability. We introduce MPSE, a geometry-aware, depth-guided multimodal framework that jointly leverages RGB appearance, monocular depth cues, and interpretable geometry descrip- tors to produce reliable and clinically calibrated size estimates. Central to MPSE is a geometry-as-query fusion block that selectively attends to depth and RGB features, and a Scale Consistency Block (SCB) that models agreement between 2D footprint–derived and 3D depth–derived cues, reducing size bias under severe distribution imbalance. The model is trained with a primary regression objective supported by an auxiliary threshold-based classification loss that stabilizes predictions near clinically important cutoffs. On our clini- cal dataset, MPSE achieves a mean absolute error of 0.93 mm and a polyp-level F1 score of 0.87 at the clinically critical 5 mm threshold, demonstrating accurate and clinically reliable size estimation in endoscopy.
Primary Subject Area: Application: Endoscopy
Secondary Subject Area: Safe and Trustworthy Learning-assisted Solutions for Medical Imaging
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Midl Latex Submission Checklist: Ensure no LaTeX errors during compilation., Replace NNN with your OpenReview submission ID., Includes \documentclass{midl}, \jmlryear{2026}, \jmlrworkshop, \jmlrvolume, \editors, and correct \bibliography command., Did not override options of the hyperref package., Did not use the times package., Use the correct spelling and format, avoid Unicode characters, and use LaTeX equivalents instead., Any math in the title and abstract must be enclosed within $...$., Did not override the bibliography style defined in midl.cls and did not use \begin{thebibliography} directly to insert references., Avoid using \scalebox; use \resizebox when needed., Included all necessary figures and removed *unused* files in the zip archive., Removed special formatting, visual annotations, and highlights used during rebuttal., All special characters in the paper and .bib file use LaTeX commands (e.g., \'e for é)., No separate supplementary PDF uploads., Acknowledgements, references, and appendix must start after the main content.
Latex Code: zip
Copyright Form: pdf
Submission Number: 298
Loading