Keywords: polyp size estimation, depth fusion, geometry-awareness, multi-modal learning, endoscopy
TL;DR: We introduce a polyp size estimation framework that fuses RGB texture, segmentation-derived geometry, and monocular depth via a PISE cross-attention block, resolving size–distance ambiguity and significantly improving accuracy.
Abstract: Accurately estimating the physical size of colorectal polyps from monocular endoscopy is
difficult due to scale ambiguity, viewpoint distortions, and strong inter-patient variability.
We introduce MPSE, a geometry-aware, depth-guided multimodal framework that jointly
leverages RGB appearance, monocular depth cues, and interpretable geometry descrip-
tors to produce reliable and clinically calibrated size estimates. Central to MPSE is a
geometry-as-query fusion block that selectively attends to depth and RGB features, and a
Scale Consistency Block (SCB) that models agreement between 2D footprint–derived and
3D depth–derived cues, reducing size bias under severe distribution imbalance. The model
is trained with a primary regression objective supported by an auxiliary threshold-based
classification loss that stabilizes predictions near clinically important cutoffs. On our clini-
cal dataset, MPSE achieves a mean absolute error of 0.93 mm and a polyp-level F1 score of
0.87 at the clinically critical 5 mm threshold, demonstrating accurate and clinically reliable
size estimation in endoscopy.
Primary Subject Area: Application: Endoscopy
Secondary Subject Area: Safe and Trustworthy Learning-assisted Solutions for Medical Imaging
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Midl Latex Submission Checklist: Ensure no LaTeX errors during compilation., Replace NNN with your OpenReview submission ID., Includes \documentclass{midl}, \jmlryear{2026}, \jmlrworkshop, \jmlrvolume, \editors, and correct \bibliography command., Did not override options of the hyperref package., Did not use the times package., Use the correct spelling and format, avoid Unicode characters, and use LaTeX equivalents instead., Any math in the title and abstract must be enclosed within $...$., Did not override the bibliography style defined in midl.cls and did not use \begin{thebibliography} directly to insert references., Avoid using \scalebox; use \resizebox when needed., Included all necessary figures and removed *unused* files in the zip archive., Removed special formatting, visual annotations, and highlights used during rebuttal., All special characters in the paper and .bib file use LaTeX commands (e.g., \'e for é)., No separate supplementary PDF uploads., Acknowledgements, references, and appendix must start after the main content.
Latex Code: zip
Copyright Form: pdf
Submission Number: 298
Loading