Real Data Distributions Prefer Simplicity and So Do Our Models: Why Machine Learning and Model Selection Are Possible

Micah Goldblum; Marc Anton Finzi; Keefer Rowan; Andrew Gordon Wilson

Real Data Distributions Prefer Simplicity and So Do Our Models: Why Machine Learning and Model Selection Are Possible

Micah Goldblum, Marc Anton Finzi, Keefer Rowan, Andrew Gordon Wilson

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: No Free Lunch, PAC-Bayes, Simplicity Bias, Model Selection, Meta-Learning

TL;DR: We demonstrate that neural networks, trained or randomly initialized, prefer the low-complexity data we observe in practice, and we explain how model selection can be automated.

Abstract: No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While all but exponentially few uniformly sampled datasets have high complexity, we argue that neural network models share the same preference for low-complexity data that we observe on real-world problems. Notably, we show that architectures designed for a particular domain, such as computer vision, are compressors for labeling functions on a variety of seemingly unrelated domains. From our experiments, we see that pre-trained and even randomly initialized language models prefer to generate low-complexity sequences and can therefore be used for inference. In principle, the use of expert knowledge and bias for simplicity of human practitioners could be folded into the learning algorithm, automating design and selection of models. We explain how typical areas requiring human intervention such as picking the appropriately sized model when labeled data is sparse or plentiful can be automated into a single learning algorithm. These observations help justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)

11 Replies

Loading