- Abstract: Many recently trained neural networks employ tens or hundreds of millions of parameters to achieve good performance. Researchers may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in smaller, randomly oriented subspaces. By slowly increasing the dimension of this subspace, we note when solutions first appear and define this to be the intrinsic dimension of the problem. A few suggestive conclusions result. Many problems have smaller intrinsic dimension than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. Intrinsic dimension allows some quantitative problem comparison across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the cart-pole RL problem is in a sense 100 times easier than classifying digits from MNIST. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the results encompass a simple method for constructively obtaining an upper bound on the minimum description length of a solution, leading in some cases to very compressible networks.
- TL;DR: We train in random subspaces of parameter space to measure how many dimensions are really needed to find a solution.
- Keywords: machine learning, neural networks, intrinsic dimension, random subspace, model understanding