Keywords: Reinforcement learning, Parameterized action MDP, Markov DecisionProcesses
Abstract: General purpose reinforcement learning (RL) agents specify exclusively discrete or continuous actions, meaning that tasks with parameterized actions have required bespoke algorithm development. We present a method to convert a parameterised action space Markov decision process into an equivalent Markov decision process with each action being of a simple type. This theoretical insight is developed into a software framework, based on Stable Baselines3 and Gymnasium, which allows researchers to deploy a pair of unmodified standard RL methods where one is responsible for selecting the action and the other for selecting the parameters. Through empirical testing in the Goal and Platform domains we demonstrate algorithm pairings that, with no hyperparameter tuning, achieve comparable performance to the custom-designed and tuned Q-PAMDP and P-DQN.
Primary Area: reinforcement learning
Submission Number: 21164
Loading