Identification of Latent Confounders via Investigating the Tensor Ranks of the Nonlinear Observations

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We study the problem of learning causal structure of discrete latent variables
Abstract: We study the problem of learning discrete latent variable causal structures from mixed-type observational data. Traditional methods, such as those based on the tensor rank condition, are designed to identify discrete latent structure models and provide robust identification bounds for discrete causal models. However, when observed variables—specifically, those representing the children of latent variables—are collected at various levels with continuous data types, the tensor rank condition is not applicable, limiting further causal structure learning for latent variables. In this paper, we consider a more general case where observed variables can be either continuous or discrete, and further allow for scenarios where multiple latent parents cause the same set of observed variables. We show that, under the completeness condition, it is possible to discretize the data in a way that satisfies the full-rank assumption required by the tensor rank condition. This enables the identifiability of discrete latent structure models within mixed-type observational data. Moreover, we introduce the two-sufficient measurement condition, a more general structural assumption under which the tensor rank condition holds and the underlying latent causal structure is identifiable by a proposed two-stage identification algorithm. Extensive experiments on both simulated and real-world data validate the effectiveness of our method.
Lay Summary: Understanding what causes certain outcomes is crucial, especially when some influencing factors are hidden or "latent." Imagine trying to figure out why some people get a particular disease, but you can only see their symptoms (observed data), not all the underlying biological processes (latent variables). Traditional methods for uncovering these hidden causes often rely on a mathematical tool called "tensor rank," but these methods only work when all the information we observe is in a simple, categorical form (like "yes" or "no"). Our research tackles a more complex and realistic scenario: what if the observed information is a mix of categories and continuous measurements (like blood pressure readings)? Or what if multiple hidden factors influence the same set of observable symptoms? We discovered a way to transform the mixed data into a form that allows the tensor rank tool to work, even with continuous measurements. This means we can now identify these hidden causal structures in a much wider range of real-world situations. We also developed a new condition and a two-step algorithm to achieve this. Our experiments show that our method is effective in identifying these hidden causes from complex, mixed-type data.
Primary Area: Probabilistic Methods->Structure Learning
Keywords: Causal discovery, discrete latent variables
Submission Number: 14001
Loading