Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks

ICLR 2026 Conference Submission22500 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Representation Learning, Model Evaluation, Downstream tasks, Probabilistic Methods, Kernel alignment, Self-supervised learning, Linear probes, Benchmarking, Generalization
Abstract: A long-standing research problem in Artificial Intelligence (AI) is to produce systems that can successfully solve any possible task. A key requirement in addressing progress in that direction is a near-infinite suite of tasks for benchmarking AI solutions. In contrast, current evaluation methods available to AI researchers in representation learning typically rely on a fixed collection of hand-picked downstream benchmarks. Hence, a large amount of effort is put into designing and searching for large collection of evaluation tasks that can serve as a proxy of our grand goal. We argue that such a rigid evaluation protocol creates a structural bottleneck in AI research. To remedy that, we define a probability distribution over downstream tasks -- Task Priors. Under this view, one can evaluate a model's performance over the set of all possible downstream tasks. Our framework is the first to provide answers to key questions such as (i) what is the average performance of my model over all possible downstream tasks weighted by the probability to encounter each task? or (ii) what is the variance of my model’s performance across all downstream tasks under the defined Task Priors? Beyond establishing a new standard for evaluation, we believe that Task Priors will accelerate the pace of research in representation learning -- where downstream task evaluation is generally the sole signal that researchers have access to.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 22500
Loading