Towards Causal Relationship in indefinite data: New Datasets and Baseline Model

Published: 19 Sept 2025, Last Modified: 19 Sept 2025Accepted by DMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The cross-fertilization of deep learning and causal discovery has given birth to broader causal data forms, involving multi-structured data like the Netsim dataset, and complex variables such as those in the RECCON dataset. Interestingly, we observe an absence of research that concurrently addresses data with multi-structures and complex variables, named `indefinite data.' In our previous survey, we introduced the concept of this data paradigm, yet exploring indefinite data still faces two substantial gaps: the dataset gap and the model gap. In this paper, we release two high-quality datasets - Causalogue and Causaction for dataset gap, containing text dialogue samples and video action samples with causal annotations respectively. Moreover, the model gap arises from the coexistence of multi-structure data and complex variables, breaking the assumptions of all current methods, and rendering them infeasible on indefinite data. To this end, we propose a probabilistic framework as a baseline. It enables overcoming challenges brought by indefinite data, and paves the way for the extension of latent confounders. Comprehensive experiments have evaluated baseline results of causal structures, causal representations, and confounding disentanglement. Our codes and datasets are available at https://github.com/Zodiark-ch/master-of-paper-Towards-Causal-Relationship-in-Indefinite-Data-Baseline-Model-and-New-Datasets.
Keywords: Causal Dataset, Causal Representation, Causal Structures, Baseline Model
Previous DMLR Submission Url: https://openreview.net/forum?id=dzJUbfMlUF
Code: https://github.com/Zodiark-ch/master-of-paper-Towards-Causal-Relationship-in-Indefinite-Data-Baseline-Model-and-New-Datasets
Assigned Action Editor: ~Mykola_Pechenizkiy1
Submission Number: 90
Loading