Track: long paper (up to 4 pages)
Keywords: Compressed representations, Disentanglement, Feature decomposition, Surgical scenes, vMF kernels
TL;DR: The study aimed for object-level disentanglement in surgical scenes but achieved partial pixel-intensity-based separation. Challenges arose as objects occupied a smaller image area, highlighting limitations in achieving true semantic disentanglement.
Abstract: Image generation through disentangling object representations is a critical area of research with significant potential. Disentanglement involves separating the representation of objects and their attributes, enabling greater control over the generated output. However, existing approaches are limited to disentangling only the objects’ attributes and generating images with selected combinations of attributes. This study explores learning object-level disentanglement of semantically rich latent representation using von-Mises-Fisher (vMF) distributions. The proposed approach aims to disentangle compressed representations into object and background classes. The approach is tested on surgical scenes for disentanglement of tools and background information using the Cholec80 dataset. Achieving tool-background disentanglement provides an opportunity to generate rare and custom surgical scenes. However, the proposed method learns to disentangle representations based on pixel intensities. This study uncovers the challenges and shortfalls in achieving object-level disentanglement of the compressed representations using vMF distributions. The code for this study is available at https://github.com/it-is-lokesh/vMF-disentanglement-challenges.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 39
Loading