Abstract: As AI systems play a progressively larger role in human affairs, it becomes more important that these systems
are built with insights from human behavior. In particular, models that are developed on the principle of
human plausibility will more likely yield results that are more accountable and more interpretable, in a way
that greater ensures an alignment between the behavior of the system and what its stakeholders want from
it. In this dissertation, I will present three projects that build on the principle of human plausibility for
three distinct applications:
(i) Plausible representations: I present the Priority-Adjusted Reply for Successor Representations (PARSR)
algorithm, a single-agent reinforcement learning algorithm that brings together the ideas of prioritisation-
based replay and successor representation learning. Both of these ideas lead to a more biologically plausible
algorithm that captures human-like capabilities of transferring and generalizing knowledge from previous
tasks to novel, unseen ones.
(ii) Plausible inference: I present a pragmatic account of the weak evidence effect, a counterintuitive phe-
nomenon of social cognition that occurs when humans must account for persuasive goals when incorporating
evidence from other speakers. This leads to a recursive, Bayesian model that encapsulates how AI systems
and their human stakeholders communicate with and understand one another in a way that accounts for the
vested interests that each will have.
(iii) Plausible evaluation: I introduce a tractable and generalizable measure for cooperative behavior
in multi-agent systems that is counterfactually contrastive, contextual, and customizable with respect to
different environmental parameters. This measure can be of practical use in disambiguating between cases
in which collective welfare is achieved through genuine cooperation, or by each agent acting solely in its own
self-interest, both of which result in the same outcome.
Loading