Learning Camera Pose from Optical Colonoscopy Frames Through Deep Convolutional Neural Network (CNN)

Mohammad Ali Armin, Nick Barnes, Jose M. Alvarez, Hongdong Li, Florian Grimpen, Olivier Salvado

Published: 2017, Last Modified: 09 Nov 2025CARE/CLIP@MICCAI 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Optical colonoscopy is performed by insertion of a long flexible colonoscope into the colon. Estimating the position of the colonoscope tip with respect to the colon surface is important as it would help localization of cancerous polyps for subsequent surgery and facilitate navigation. Knowing camera pose is also essential for 3D automatic scene reconstruction, which could support clinicians inspecting the whole colon surface thereby reducing missed polyps. This paper presents a method to estimate the pose of the colonoscope camera with six degrees of freedom (DoF) using deep convolutional neural network (CNN). Because obtaining a ground truth to train the CNN for camera pose from actual colonoscopy videos is extremely challenging, we trained the CNN using realistic synthetic videos generated with a colonoscopy simulator, which could generate the exact camera pose parameters. We validated the trained CNN on unseen simulated video datasets and on actual colonoscopy videos from 10 patients. Our results showed that the colonoscopy camera pose could be estimated with higher accuracy and speed than feature based computer vision methods such as the classical structure from motion (SfM) pipeline. This paper demonstrates that transfer learning from surgical simulation to actual endoscopic based surgery is a possible approach for deep learning technologies.