Extracting Rare Dependence Patterns via Adaptive Sample Reweighting

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Discovering dependence patterns between variables from observational data is a fundamental issue in data analysis. However, existing testing methods often fail to detect subtle yet critical patterns that occur within small regions of the data distribution--patterns we term rare dependence. These rare dependencies obscure the true underlying dependence structure in variables, particularly in causal discovery tasks. To address this issue, we propose a novel testing method that combines kernel-based (conditional) independence testing with adaptive sample importance reweighting. By learning and assigning higher importance weights to data points exhibiting significant dependence, our method amplifies the patterns and can detect them successfully. Theoretically, we analyze the asymptotic distributions of the statistics in this method and show the uniform bound of the learning scheme. Furthermore, we integrate our tests into the PC algorithm, a constraint-based approach for causal discovery, equipping it to uncover causal relationships even in the presence of rare dependence. Empirical evaluation of synthetic and real-world datasets comprehensively demonstrates the efficacy of our method.
Lay Summary: Sometimes, important patterns in data are easy to miss, especially if they only show up in a small number of cases. For example, a rare dependence between genes might only appear under specific conditions, but could be crucial for understanding a disease. Most existing statistical tools for analyzing data overlook these subtle patterns because they’re designed to focus on the “big picture.” In our work, we develop a method that helps statistical methods pay more attention to these rare but meaningful patterns. We do this by training the system to assign more weight to data points that seem to show signs of interesting behavior, essentially telling it, “look here more closely.” This makes it easier to detect subtle relationships between variables that would otherwise be hidden. We also show how our method can help discover cause-and-effect relationships, even when these are obscured by rare dependence. Our tests on both simulated and real-world datasets show that this approach leads to more accurate and insightful discoveries.
Link To Code: https://github.com/leeedwina430/RKCIT
Primary Area: General Machine Learning->Causality
Keywords: Independence tests, Sample reweighting, Causal discovery
Submission Number: 12900
Loading