Evaluating representations by the complexity of learning low-loss predictors

William F Whitney; Min Jae Song; David Brandfonbrener; Jaan Altosaar; Kyunghyun Cho

Evaluating representations by the complexity of learning low-loss predictors

William F Whitney, Min Jae Song, David Brandfonbrener, Jaan Altosaar, Kyunghyun Cho

Published: 01 Apr 2021, Last Modified: 22 Jun 2025Neural Compression Workshop @ ICLR 2021Readers: Everyone

Keywords: representation evaluation, representation learning, minimum description length, mdl

TL;DR: A new measure similar to MDL allows evaluations of representation quality that are more reliable.

Abstract: We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest. To this end, we introduce two measures: surplus description length (SDL) and $\varepsilon$ sample complexity ($\varepsilon$SC). To compare our methods to prior work, we also present a framework based on plotting the validation loss versus evaluation dataset size (the "loss-data" curve). Existing measures, such as mutual information and minimum description length, correspond to slices and integrals along the data axis of the loss-data curve, while ours correspond to slices and integrals along the loss axis. This analysis shows that prior methods measure properties of an evaluation dataset of a specified size, whereas our methods measure properties of a predictor with a specified loss. We conclude with experiments on real data to compare the behavior of these methods over datasets of varying size.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/evaluating-representations-by-the-complexity/code)

1 Reply

Loading