Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations

Tal Barami; Nimrod Berman; Ilan Naiman; Amos Haviv Hason; Rotem Ezra; Omri Azencot

Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations

Tal Barami, Nimrod Berman, Ilan Naiman, Amos Haviv Hason, Rotem Ezra, Omri Azencot

Published: 18 Sept 2025, Last Modified: 16 Dec 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Representation learning, disentanglement, sequential disentanglement, benchmarking

TL;DR: The first benchmark for multi-factor sequential disentanglement representations, introduces a novel method, and leverages Vision-Language Models to automate annotation and evaluation—enabling scalable, label-free workflows.

Abstract: Learning disentangled representations in sequential data is a key goal in deep learning, with broad applications in vision, audio, and time series. While real-world data involves multiple interacting semantic factors over time, prior work has mostly focused on simpler two-factor static and dynamic settings, primarily because such settings make data collection easier, thereby overlooking the inherently multi-factor nature of real-world data. We introduce the first standardized benchmark for evaluating multi-factor sequential disentanglement across six diverse datasets spanning video, audio, and time series. Our benchmark includes modular tools for dataset integration, model development, and evaluation metrics tailored to multi-factor analysis. We additionally propose a post-hoc Latent Exploration Stage to automatically align latent dimensions with semantic factors, and introduce a Koopman-inspired model that achieves state-of-the-art results. Moreover, we show that Vision-Language Models can automate dataset annotation and serve as zero-shot disentanglement evaluators, removing the need for manual labels and human intervention. Together, these contributions provide a robust and scalable foundation for advancing multi-factor sequential disentanglement. Our code is available on GitHub, and the datasets and trained models are available on Hugging Face.

Croissant File: zip

Dataset URL: https://huggingface.co/collections/AmosHason/msd-benchmark-68ced7cd3742906799e9ebc4

Code URL: https://github.com/azencot-group/MSD-Benchmark

Supplementary Material: zip

Primary Area: Other (please use sparingly, only use the keyword field for more details)

Submission Number: 935

Loading