Incorporating human plausibility in single-and multi-agent AI systems

Samuel A Barnett

Published: 01 May 2024, Last Modified: 20 Apr 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: As AI systems play a progressively larger role in human affairs, it becomes more important that these systems are built with insights from human behavior. In particular, models that are developed on the principle of human plausibility will more likely yield results that are more accountable and more interpretable, in a way that greater ensures an alignment between the behavior of the system and what its stakeholders want from it. In this dissertation, I will present three projects that build on the principle of human plausibility for three distinct applications: (i) Plausible representations: I present the Priority-Adjusted Reply for Successor Representations (PARSR) algorithm, a single-agent reinforcement learning algorithm that brings together the ideas of prioritisation- based replay and successor representation learning. Both of these ideas lead to a more biologically plausible algorithm that captures human-like capabilities of transferring and generalizing knowledge from previous tasks to novel, unseen ones. (ii) Plausible inference: I present a pragmatic account of the weak evidence effect, a counterintuitive phe- nomenon of social cognition that occurs when humans must account for persuasive goals when incorporating evidence from other speakers. This leads to a recursive, Bayesian model that encapsulates how AI systems and their human stakeholders communicate with and understand one another in a way that accounts for the vested interests that each will have. (iii) Plausible evaluation: I introduce a tractable and generalizable measure for cooperative behavior in multi-agent systems that is counterfactually contrastive, contextual, and customizable with respect to different environmental parameters. This measure can be of practical use in disambiguating between cases in which collective welfare is achieved through genuine cooperation, or by each agent acting solely in its own self-interest, both of which result in the same outcome.