Advancing surgical VQA with scene graph knowledge

Kun Yuan, Manasi Kattel, Joël L. Lavanchy, Nassir Navab, Vinkle Srivastav, Nicolas Padoy

Published: 2024, Last Modified: 05 Apr 2026Int. J. Comput. Assist. Radiol. Surg. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with natural language capabilities is emerging as a necessity. Our work aims to advance visual question answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question–condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design.