What can we Learn by Predicting Accuracy?Download PDF

Anonymous

02 Jun 2022 (modified: 29 Aug 2022)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone
Keywords: accuracy, datasets representation, explainability, symbolic regression, genetic programming
TL;DR: We propose to automatically discover a formula able to predict the accuracy of a linear classifier before having to train it. The formula found on more than 260 datasets of embeddings is highly explainable and consistent with decades of research.
Abstract: This paper seeks to answer the following question: "What can we learn by predicting accuracy?". Indeed, classification is one of the most popular tasks in machine learning, and many loss functions have been developed to maximize this non-differentiable objective function. Unlike past work on loss function design, which was guided mainly by intuition and theory before being validated by experimentation, here we propose to approach this problem in the opposite way: we seek to extract knowledge by experimentation. This data-driven approach is similar to that used in physics to discover general laws from data. We used a symbolic regression method to automatically find a mathematical expression highly correlated with a linear classifier's accuracy. The formula discovered on more than 260 datasets of embeddings has a Pearson's correlation of 0.96 and a r2 of 0.93. More interestingly, this formula is highly explainable and confirms insights from various previous papers on loss design. We hope this work will open new perspectives in the search for new heuristics leading to a deeper understanding of machine learning theory.
0 Replies

Loading