Keywords: multi-task regression, high-dimensional regression, representation learning, low-sample regression, linear regression
Abstract: High-dimensional prediction with limited samples poses a significant challenge due to severe overfitting. While existing approaches tackle this via regularization, clustering, or representation learning, we introduce a novel framework inspired by causal inference that is designed to exploit latent structure linking predictors and responses. Our approach employs a new similarity-based clustering procedure guided by a metric that quantifies shared predictor-response dependencies, which tends to group variables that play similar roles with respect to (possibly latent) mediators or confounders. The resulting causality-inspired features are then incorporated into an augmented regression model, yielding sparser, more robust, and more generalizable predictions without attempting to recover the underlying causal graph. Experiments across synthetic and real-world datasets, including S&P 500 market data, demonstrate that our method achieves higher regression performance and markedly reduces overfitting compared to existing baselines.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 5987
Loading