Keywords: Laparoscopic Surgery, Vision-Language Models, Spatial Awareness
TL;DR: This paper introduces SpatialContext, a simple way to inject anatomical scene geometry into vision-language models for laparoscopic multi-organ recognition.
Registration Requirement: Yes
Abstract: Accurate anatomical landmark identification is important for safe laparoscopic navigation,
yet limited view and strong tissue similarity make multi-label organ classification diffi-
cult. Existing vision-language models mainly rely on appearance and overlook the spatial
structure of surgical scenes (Zhang et al., 2025). We propose SpatialContext, a multi-
modal framework that injects scene geometry into classification through natural language
prompts derived from segmentation masks, together with a context-conditional training
strategy centered on the primary surgical target. Results on DSAD (Carstens et al., 2023)
and Endoscapes (Mascagni et al., 2025) show improved recognition of scene-defining and
off-target anatomy, suggesting that explicit spatial semantics can improve surgical scene
understanding.
Visa & Travel: No
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 109
Loading