Self-Supervised Cyclic Diffeomorphic Mapping for Soft Tissue Deformation Recovery in Robotic Surgery Scenes
Abstract: The ability to recover tissue deformation from
surgical video is fundamental for many downstream ap-
plications in robotic surgery. Despite noticeable advance-
ments, this task remains under-explored due to the com-
plex dynamics of soft tissues manipulated by surgical in-
struments. Achieving dense and accurate tissue tracking
is further complicated by ambiguous pixel correspondence
in regions with homogeneous texture. In this paper, we
introduce a novel self-supervised framework to recover
tissue deformations from stereo surgical videos. Our ap-
proach integrates semantics, cross-frame motion flow, and
long-range temporal dependencies to accurately represent
tissue dynamics for deformation recovery. Moreover, we
incorporate diffeomorphic mapping to regularize the warp-
ing field to be physically more realistic. To comprehen-
sively evaluate our method, we collected stereo surgical
video clips containing three types of tissue manipulation
(i.e., pushing, dissection and retraction) from two surgical
procedures (i.e., hemicolectomy and mesorectal excision).
Our method demonstrates promising results in capturing
tissue 3D deformation, and generalizes well across differ-
ent actions and procedures. It also outperforms current
state-of-the-art approaches based on non-rigid registration
and optical flow estimation. To the best of our knowledge,
this is the first work on self-supervised learning for dense
tissue deformation modeling from stereo surgical videos.
Loading