Assessing the Interpretability of Programmatic Policies using Large Language Models

Zahra Bashir; Michael Bowling; Levi Lelis

Assessing the Interpretability of Programmatic Policies using Large Language Models

Zahra Bashir, Michael Bowling, Levi Lelis

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Programmatic Policies, Interpretability, Program Synthesis

TL;DR: We propose the LINT score, a novel metric using large-language models to assess the interpretability of programmatic policies, and show its correlation with human understanding of interpretability and program behavior.

Abstract:

Programmatic representations of policies for solving sequential decision-making problems often carry the promise of interpretability. However, previous work on programmatic policies has only presented anecdotal evidence of policy interpretability. The lack of systematic evaluations of policy interpretability can be attributed to user studies being time-consuming and costly. In this paper, we introduce the LLM-based INTerpretability (LINT) score, a simple and cost-effective metric that uses large-language models (LLMs) to assess the interpretability of programmatic policies. To compute the LINT score of a policy, an LLM generates a natural language description of the policy's behavior. This description is then passed to a second LLM, which attempts to reconstruct the policy from the natural language description. The LINT score measures the behavioral similarity between the original and reconstructed policies. We hypothesized that the LINT score of programmatic policies correlates with their actual interpretability, and evaluated this hypothesis in the domains of MicroRTS and Karel the Robot. Our evaluation relied on a technique from the static obfuscation literature and a user study, where people with various levels of programming proficiency evaluated the interpretability of the programmatic policies. The results of our experiments support our hypothesis. Specifically, the LINT score decreases as the level of obfuscation of the policies increases. The user study showed that LINT can correctly distinguish the ``degree of interpretability'' of programmatic policies generated by the existing algorithms. Our results suggest that LINT can be a helpful tool for advancing the research on interpretability of programmatic policies.

Primary Area: interpretability and explainable AI

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5338

Loading