everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Programmatic representations of policies for solving sequential decision-making problems often carry the promise of interpretability. However, previous work on programmatic policies has only presented anecdotal evidence of policy interpretability. The lack of systematic evaluations of policy interpretability can be attributed to user studies being time-consuming and costly. In this paper, we introduce the LLM-based INTerpretability (LINT) score, a simple and cost-effective metric that uses large-language models (LLMs) to assess the interpretability of programmatic policies. To compute the LINT score of a policy, an LLM generates a natural language description of the policy's behavior. This description is then passed to a second LLM, which attempts to reconstruct the policy from the natural language description. The LINT score measures the behavioral similarity between the original and reconstructed policies. We hypothesized that the LINT score of programmatic policies correlates with their actual interpretability, and evaluated this hypothesis in the domains of MicroRTS and Karel the Robot. Our evaluation relied on a technique from the static obfuscation literature and a user study, where people with various levels of programming proficiency evaluated the interpretability of the programmatic policies. The results of our experiments support our hypothesis. Specifically, the LINT score decreases as the level of obfuscation of the policies increases. The user study showed that LINT can correctly distinguish the ``degree of interpretability'' of programmatic policies generated by the existing algorithms. Our results suggest that LINT can be a helpful tool for advancing the research on interpretability of programmatic policies.