Abstract: Can emergent language models faithfully model the intelligence of decision-making agents? Though modern language models already exhibit some reasoning ability, and theoretically can potentially express any probable distribution over tokens, it remains underexplored how the world knowledge these pre-trained models have memorized can be utilised to comprehend an agent's behaviour in the physical world. This paper empirically examines, for the first time, how well large language models (LLMs) can build a mental model of reinforcement learning (RL) agents, termed agent mental modelling, by reasoning about an agent's behaviour and its effect on states from agent interaction history. This research attempts to unveil the potential of leveraging LLMs for elucidating RL agent behaviour, addressing a key challenge in explainable RL. To this end, we propose specific evaluation metrics and test them on selected RL task datasets of varying complexity, reporting findings on agent mental model establishment. Our results disclose that LLMs are not yet capable of fully realising the mental modelling of agents through inference alone without further innovations. This work thus provides new insights into the capabilities and limitations of modern LLMs, highlighting that while they show promise in understanding agents with a longer history context, preexisting beliefs within LLMs about behavioural optimum and state complexity limit their ability to fully comprehend an agent's behaviour and action effects.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. updated figures to include GPT-4o (SOTA model) results
2. added more context to the introduction and related work sections
3. added clarification to the methodology section
4. expanded explanations/discussion in the result section
5. added takeaway message and limitations
Assigned Action Editor: ~Hanie_Sedghi1
Submission Number: 3111
Loading