Track: Full paper
Keywords: Large Language Models, Natural Language Processing, Interpretability and Analysis
TL;DR: We show that transformers can learn underlying transition dynamics when trained to predict data generated by Markov Decision Processes.
Abstract: Language models have displayed a wide array of capabilities, but the reason for their performance remains a topic of heated debate and investigation. Do these models simply recite the observed training data, or are they able to abstract away surface statistics and learn the underlying processes from which the data was generated? To investigate this question, we explore the capabilities of a GPT model in the context of Markov Decision Processes (MDPs), where the underlying transition dynamics and policies are not directly observed. The model is trained to predict the next state or action without any initial knowledge of the MDPs or the players' policies. Despite this, we present evidence that the model develops emergent representations of the underlying parameters governing the MDPs.
Copyright PDF: pdf
Submission Number: 44
Loading