A mathematical model for estimating anti-learning when a decision tree solves the parity bit problem

TMLR Paper183 Authors

15 Jun 2022 (modified: 28 Feb 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: On some data, machine learning displays anti-learning; this means, in the most surprising scenario, that the more examples you place in the training set, the worse the accuracy becomes, until it becomes $0\%$ on the test set. We produce a framework in which this kind of anti-learning can be reproduced and studied theoretically. We deduce a formula estimating anti-learning when decision trees (one of the most important tools of machine learning) solve the parity bit problem (one of the most famously tricky problems of machine learning). Our estimation formula (deduced under certain mathematical assumptions) agrees very well with experimental results (produced on random data without these assumptions).
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Some more small changes have been made following reviewers' suggestions.
Assigned Action Editor: ~Roi_Livni1
Submission Number: 183
Loading