Abstract: The policy trained via reinforcement learning (RL) makes decisions based on sensor-derived state features. It is common for state features to evolve for reasons such as periodic sensor maintenance or the addition of new sensors for performance improvement. The deployed policy fails in new state space when state features are unseen during training. Previous work tackles this challenge by training a sensor-invariant policy or generating multiple policies and selecting the appropriate one with limited samples. However, both directions struggle to guarantee the performance when faced with unpredictable evolutions. In this paper, we formalize this problem as state evolvable reinforcement learning (SERL), where the agent is required to mitigate policy degradation after state evolutions without costly exploration. We propose **Lapse** by reusing policies learned from the old state space in two distinct aspects. On one hand, Lapse directly reuses the *robust* old policy by composing it with a learned state reconstruction model to handle vanishing sensors. On the other hand, the behavioral experience from the old policy is reused by Lapse to train a newly adaptive policy through offline learning, better utilizing new sensors. To leverage advantages of both policies in different scenarios, we further propose *automatic ensemble weight adjustment* to effectively aggregate them. Theoretically, we justify that robust policy reuse helps mitigate uncertainty and error from both evolution and reconstruction. Empirically, Lapse achieves a significant performance improvement, outperforming the strongest baseline by about $2\times$ in benchmark environments.
Lay Summary: Many automated systems rely on sensor data to make decisions, but sensors can change over time—for example, due to maintenance or upgrades. When this happens, decision-making policies trained on old sensor data often experience a drop in performance, because they are not prepared for new or missing information.
Previous solutions try to make policies ignore sensor differences or train several policies and pick the right one, but these approaches often struggle when facing unexpected sensor changes. To address this, we introduce a new framework called state evolvable reinforcement learning (SERL), which aims to maintain reliable performance even as sensors change, without costly trial-and-error.
Our method, called Lapse, reuses knowledge from old sensor setups in two ways: it adapts the old policy to work with missing sensors and uses experience from the old setup to train a new, more adaptive policy for new sensors. Lapse can automatically combine both approaches depending on the situation. In tests, Lapse showed significantly better performance than previous methods, helping automated systems stay reliable as their sensors evolve.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Reinforcement Learning
Keywords: Reinforcement Learning, Open Environment, State Evolution, Policy Reuse, Adaptation
Submission Number: 4577
Loading