Sparse and Low-bias Estimation of High Dimensional Vector Autoregressive Models

Trevor Ruiz; Sharmodeep Bhattacharyya; Mahesh Balasubramanian; Kristofer Bouchard

Sparse and Low-bias Estimation of High Dimensional Vector Autoregressive Models

Trevor Ruiz, Sharmodeep Bhattacharyya, Mahesh Balasubramanian, Kristofer Bouchard

08 Jun 2020 (modified: 05 May 2023)L4DC 2020Readers: Everyone

Abstract: Vector autoregressive ($VAR$) models are widely used for causal discovery and forecasting in multivariate time series analysis. In the high-dimensional setting, which is increasingly common in fields such as neuroscience and econometrics, model parameters are inferred by $L_1$-regularized maximum likelihood (RML). A well-known feature of RML inference is that in general the technique produces a trade-off between sparsity and bias that depends on the choice of the regularization hyperparameter. In the context of multivariate time series analysis, sparse estimates are favorable for causal discovery and low-bias estimates are favorable for forecasting. However, owing to a paucity of research on hyperparameter selection methods, practitioners must rely on \textit{ad-hoc} methods such as cross-validation (or manual tuning). The particular balance that such approaches achieve between the two goals --- causal discovery and forecasting --- is poorly understood. Our paper investigates this behavior and proposes a method ($UoI_{VAR}$) that achieves a better balance between sparsity and bias when the underlying causal influences are in fact sparse. We demonstrate through simulation that RML with a hyperparameter selected by cross-validation tends to overfit, producing relatively dense estimates. We further demonstrate that $UoI_{VAR}$ much more effectively approximates the correct sparsity pattern with only a minor compromise in model fit, particularly so for larger data dimensions, and that the estimates produced by $UoI_{VAR}$ exhibit less bias. We conclude that our method achieves improved performance especially well-suited to applications involving simultaneous causal discovery and forecasting in high-dimensional settings.

0 Replies

Loading