Evaluating representations by the complexity of learning low-loss predictorsDownload PDF

Mar 04, 2021 (edited Apr 01, 2021)Neural Compression Workshop @ ICLR 2021Readers: Everyone
  • Keywords: representation evaluation, representation learning, minimum description length, mdl
  • TL;DR: A new measure similar to MDL allows evaluations of representation quality that are more reliable.
  • Abstract: We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest. To this end, we introduce two measures: surplus description length (SDL) and $\varepsilon$ sample complexity ($\varepsilon$SC). To compare our methods to prior work, we also present a framework based on plotting the validation loss versus evaluation dataset size (the "loss-data" curve). Existing measures, such as mutual information and minimum description length, correspond to slices and integrals along the data axis of the loss-data curve, while ours correspond to slices and integrals along the loss axis. This analysis shows that prior methods measure properties of an evaluation dataset of a specified size, whereas our methods measure properties of a predictor with a specified loss. We conclude with experiments on real data to compare the behavior of these methods over datasets of varying size.
1 Reply