Common Benchmarks Undervalue the Generalization Power of Programmatic Policies

Published: 21 Jun 2025, Last Modified: 26 Jul 2025RLC 2025 Workshop PRLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, programmatic policies, neural policies, out-of-distribution generalization
TL;DR: We argue that commonly used benchmarks undervalue the generalization capacity of programmatic policies.
Abstract: Algorithms for learning programmatic representations for sequential decision-making problems are often evaluated on out-of-distribution (OOD) problems, with the common conclusion that programmatic policies generalize better than neural policies on OOD problems. In this position paper, we argue that commonly used benchmarks undervalue the generalization capabilities of programmatic representations. We analyze the experiments of four papers from the literature and show that, with simple modifications, neural policies shown not to generalize can generalize as effectively as programmatic policies on OOD problems. This is achieved with simple changes in the training pipeline of the neural policies. Namely, we show that simpler neural architectures with the same type of sparse observation used with programmatic policies can help attain OOD generalization. Another modification we have shown to be effective is the use of reward functions that allow for safer policies (e.g., agents that drive slowly can generalize better). Also, we argue for creating benchmark problems highlighting concepts needed for OOD generalization that may challenge neural policies but align with programmatic representations, such as tasks requiring algorithmic constructs like stacks.
Format: We have read the camera-ready instructions, and our paper is formatted with the provided template.
De-Anonymization: This submission has been de-anonymized.
Presenter: ~Amirhossein_Rajabpour1
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 7
Loading