This is the code implementation for the submission titled "Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep Reinforcement Learning." 

Reinforcement learning (RL) is a powerful tool for finding optimal policies in sequential decision processes. However, deep RL methods have two weaknesses: collecting the amount of agent experience required for practical RL problems is prohibitively expensive, and the learned policies exhibit poor generalization on tasks outside the training data distribution. To mitigate these issues, we introduce automaton distillation, a form of neuro-symbolic transfer learning in which Q-value estimates from a teacher are distilled into a low-dimensional representation in the form of an automaton. 


We then propose methods for generating Q-value estimates where symbolic information is extracted from a teacher’s Deep Q-Network (DQN). The resulting Q-value estimates are used to bootstrap learning in the target discrete and continuous environment via a modified DQN and Twin-Delayed Deep Deterministic (TD3) loss function, respectively. We demonstrate that automaton distillation decreases the time required to find optimal policies for various decision tasks in new environments, even in a target environment different in structure from the source environment.


This is the code implementation for the submission titled "Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep Reinforcement Learning." 

This artifact presents the code implementation for the paper titled "Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep Reinforcement Learning." The implementation addresses two significant challenges in deep reinforcement learning (RL): the high cost of collecting agent experience and the poor generalization of learned policies to tasks outside the training distribution. We introduce automaton distillation, a novel approach to neuro-symbolic transfer learning, which distills Q-value estimates from a teacher into a low-dimensional automaton representation.

Our methods leverage symbolic information extracted from a teacher's Deep Q-Network (DQN) to generate Q-value estimates, which are subsequently used to enhance learning in both discrete and continuous environments through a modified DQN and Twin-Delayed Deep Deterministic (TD3) loss functions. We demonstrate that automaton distillation significantly reduces the time required to identify efficient policies across various decision tasks in unfamiliar environments.

The code implementation includes tests for five distinct reinforcement learning methods: Dynamic Automaton Distillation (proposed), Static Automaton Distillation (proposed), CRM, DQN on Product MDP, and DQN without transfer learning (vanilla). 

The organizational structure of the code is as follows:
- automaton_q: Cached automata and teacher Q values. This is extra information of the teacher that will be loaded later when running the student config.
- lib: the actual implementation of automaton transfer
- run: individual experiments. The files themselves should craft a config and call lib.main.run_training
  - env: different environments that will be reused across different runs. Their configurations live here.
  - utils: when two experiments require similar configurations, create a function in here to reduce repetition
  - teacher: the teacher configs that serve either as DQN without transfer learning or as "teacher"
  - target: the target configs that serve as the "student" in knowledge transfer
  - experiment: file used for generating results submitted to TMLR
