Keywords: instrumental variable regression, NPIV, nonparametric statistics, feature learning, causal inference, operator learning
Abstract: We address the problem of causal effect estimation in the presence of hidden confounders, using nonparametric instrumental variable (IV) regression.
A leading strategy employs \emph{spectral features} - that is, learned features spanning the top eigensubspaces of the operator linking treatments to instruments.
We derive a generalization error bound for a two-stage least squares estimator based on spectral features, and gain insights into the method's performance and failure modes. We show that performance depends on two key factors, leading to a clear taxonomy of outcomes. In a \emph{good} scenario, the approach is optimal. This occurs with strong \emph{spectral alignment}, meaning the structural function is well-represented by the top eigenfunctions of the conditional operator, coupled with this operator's slow eigenvalue decay, indicating a strong instrument. Performance degrades in a \emph{bad} scenario: spectral alignment remains strong, but rapid eigenvalue decay (indicating a weaker instrument) demands significantly more samples for effective feature learning. Finally, in the \emph{ugly} scenario, weak spectral alignment causes the method to fail, regardless of the eigenvalues' characteristics. Our synthetic experiments empirically validate this taxonomy. We further introduce a practical procedure to estimate these spectral properties from data, allowing practitioners to diagnose which regime a given problem falls into. We apply this method to the dSprites dataset, demonstrating its utility.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 10036
Loading