Keywords: Endoscopic image, 3D vision, semantic segmentation, depth estimation, size measurement, benchmark dataset
Abstract: Accurate polyp sizing during endoscopy is crucial for cancer risk assessment but is hindered by subjective methods and inadequate datasets lacking integrated 2D appearance, 3D structure, and real-world size information. We introduce PolypSense3D, the first multi-source benchmark dataset specifically targeting depth-aware polyp size measurement. It uniquely integrates over 43,000 frames from virtual simulations, physical phantoms, and clinical sequences, providing synchronized RGB, dense/sparse depth, segmentation masks, camera parameters, and millimeter-scale size labels derived via a novel forceps-assisted in-vivo annotation technique. To establish its value, we benchmark state-of-the-art segmentation and depth estimation models. Results quantify significant domain gaps between simulated/phantom and clinical data and reveal substantial error propagation from perception stages to final size estimation, with the best fully automated pipelines achieving an average Mean Absolute Error (MAE) of 0.95 mm on the clinical data subset. Publicly released under CC BY-SA 4.0 with code and evaluation protocols, PolypSense3D offers a standardized platform to accelerate research in robust, clinically relevant quantitative endoscopic vision. The benchmark dataset and code are available at: https://github.com/HNUicda/PolypSense3D and https://doi.org/10.7910/DVN/K13H89.
Croissant File: json
Dataset URL: https://doi.org/10.7910/DVN/K13H89
Code URL: https://github.com/HNUicda/PolypSense3D
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 1189
Loading