Keywords: Capability-oriented evaluation, Bayesian Triangulation, cognitive profile
Abstract: As machine learning models become more general, we need to characterise their capabilities in richer, more interpretable ways that move beyond aggregated statistics on static benchmarks. We describe a method to infer the cognitive profile of a system from diverse experimental data. To do so, we introduce measurement layouts which model how task-instance features interact with system capabilities to explain performance. System capabilities can be estimated using Bayesian triangulation, inferring their value based on the performance of a model on tasks with different features. Our approach accurately recovers the cognitive profiles of hand-crafted behavioural agents, as well as estimating the cognitive profiles for deep reinforcement learning agents and human children in a virtual game environment. These cognitive profiles are significantly richer than aggregated benchmark statistics, summarising multiple distinct capabilities that explain behaviour, and are also significantly more predictive, accurately estimating performance on new, held-out tasks.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 11706
Loading