Content-based retrieval of video segments from minimally invasive surgery videos using deep convolutional video descriptors and iterative query refinement

Deepak Roy Chittajallu, Arslan Basharat, Paul Tunison, Samantha Horvath, Katerina O. Wells, Steven G. Leeds, James W. Fleshman, Ganesh Sankaranarayanan, Andinet Enquobahrie

Published: 01 Jan 2019, Last Modified: 13 Aug 2024Medical Imaging: Image-Guided Procedures 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Despite a strong evidence of the clinical and economic benefits of minimally invasive surgery (MIS) for many common surgical procedures, there is a gross underutilization of MIS in many US hospitals, potentially due to its steep learning curve. Intraoperative videos captured using a camera inserted into the body during MIS procedures are emerging as an invaluable resource for MIS education, skill assessment and quality assurance. However, these videos often have a duration of several hours and there is a pressing need for automated tools to help surgeons quickly find key semantic segments of interest within MIS videos. In this paper, we present a novel integrated approach for facilitating content-based retrieval of video segments that are semantically similar to a query video within a large collection of MIS videos. We use state-of-theart deep 3D convolutional neural network (CNN) models pre-trained on large public video classification datasets to extract spatiotemporal features from MIS video segments and employ an iterative query refinement (IQR) strategy where in a support vector machine (SVM) classifier trained online based on relevance feedback from the user is used to refine the search results iteratively. We show that our method outperforms the state-of-the-art on the SurgicalActions160 dataset containing 160 video clips of typical surgical actions in gynecologic MIS procedures.