CFA: Causal Feature Augmentation for High-Dimensional Linear Regression

Sepehr Elahi; Ehsan Mokhtarian; Negar Kiyavash; Patrick Thiran

CFA: Causal Feature Augmentation for High-Dimensional Linear Regression

Sepehr Elahi, Ehsan Mokhtarian, Negar Kiyavash, Patrick Thiran

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-task regression, high-dimensional regression, representation learning, low-sample regression, linear regression

Abstract: High-dimensional prediction with limited samples poses a significant challenge due to severe overfitting. While existing approaches tackle this via regularization, clustering, or representation learning, we introduce a novel framework inspired by causal inference that is designed to exploit latent structure linking predictors and responses. Our approach employs a new similarity-based clustering procedure guided by a metric that quantifies shared predictor-response dependencies, which tends to group variables that play similar roles with respect to (possibly latent) mediators or confounders. The resulting causality-inspired features are then incorporated into an augmented regression model, yielding sparser, more robust, and more generalizable predictions without attempting to recover the underlying causal graph. Experiments across synthetic and real-world datasets, including S&P 500 market data, demonstrate that our method achieves higher regression performance and markedly reduces overfitting compared to existing baselines.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 5987

Loading