A Reinforcement Learning Approach for Personalized Automated Anesthesia Control

Manuela Merlo; Francesco Trovò; Alberto Maria Metelli

A Reinforcement Learning Approach for Personalized Automated Anesthesia Control

Manuela Merlo, Francesco Trovò, Alberto Maria Metelli

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: automatic anesthesia control, reinforcement learning, patient-specific treatment

Abstract: Every year, millions of surgical procedures require general anesthesia. Despite partial automation and general guidelines, this task remains challenging due to unpredictable patient responses influenced by latent factors such as demographic data and comorbidities. To enhance previously investigated automated systems, we explore Reinforcement Learning (RL) and frame the anesthesia control problem as a Markov Decision Process (MDP). The action space includes continuous propofol and remifentanil infusion rates, the commonly used drugs for hypnotic and analgesic purposes, respectively. The observation space, on the other hand, includes physiological signals and demographic data. The reward function aims to optimize hypnosis depth, hemodynamic stability, and drug usage. To solve this problem, we employ the TD3 algorithm and incorporate a mechanism inspired by Feature-wise Linear Modulation (FiLM) in the feature extractor. This approach generates a conditioned version of physiological states based on the patient’s demographic data, helping the agent to personalize its strategy. To mitigate ethical risks associated with training exploration, we used the patient simulator AReS, as a training and testing environment. We use a PID controller optimized with a genetic algorithm on a limited task as a reference. The results demonstrate that our agent achieves comparable performance to the PID controller on a broader objective. Lastly, we employ SHapley Additive exPlanations (SHAP) values to evaluate how features influence the agent’s decisions. These explanations suggest that the agent’s behavior aligns with the models employed by AReS. These findings underscore the potential of RL to deliver effective and specific anesthesia control for each individual patient.

Submission Number: 95

Loading