Keywords: DAG learning, graduated optimization, heteroscedastic noise models, loss weight schedule
TL;DR: This work highlights a unique optimization challenge that arises specifically when learning DAGs under HNMs and proposes a novel optimization strategy to overcome this challenge.
Abstract: This study focuses on the heteroscedastic noise model (HNM), wherein an effect is a function of its cause and a Gaussian noise term whose variance depends on the cause. Integrating HNMs into a continuous optimization framework allows us to learn a causal directed acyclic graph (DAG) under an acyclicity constraint by maximizing a likelihood objective parameterized by both mean and variance. However, DAG learning under HNM inherit the challenges of gradient-based likelihood optimization: the gradient is scaled by the predictive variance, which introduces a new optimization issue in DAG learning under an acyclicity constraint. In early training, because the gradients of reconstruction loss is scaled by the predicted variance, it becomes heavily attenuated; as a result, the DAG parameters are updated primarily by the acyclicity constraint, hindering effective structure learning. To address this, we propose a graduated optimization strategy with weighted loss scheduling. We introduce a scheduling coefficient into the loss, starting with a high weight for stable mean and variance learning, then gradually lowering the coefficient to transition to the standard likelihood objective and enforce acyclicity. This approach ensures that the learned DAG more faithfully reflects the data. Experimental results on synthetic and real-world data show that our method outperforms existing approaches in terms of structure learning accuracy.
Primary Area: causal reasoning
Submission Number: 12279
Loading