A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights

Xiaoyi Mai, Zhenyu Liao, Romain Couillet

2019 (modified: 19 Apr 2023)ICASSP 2019Readers: Everyone

Abstract: Logistic regression, one of the most popular machine learning binary classification methods, has been long believed to be unbiased. In this paper, we consider the "hard" classification problem of separating high dimensional Gaussian vectors, where the data dimension p and the sample size n are both large. Based on recent advances in random matrix theory (RMT) and high dimensional statistics, we evaluate the asymptotic distribution of the logistic regression classifier and consequently, provide the associated classification performance. This brings new insights into the internal mechanism of logistic regression classifier, including a possible bias in the separating hyperplane, as well as on practical issues such as hyper-parameter tuning, thereby opening the door to novel RMT-inspired improvements.

0 Replies