TL;DR: Maximizing trajectory total correlation for learning simple and robust RL policies.
Abstract: Simplicity is a powerful inductive bias. In reinforcement learning, regularization is used for simpler policies, data augmentation for simpler representations, and sparse reward functions for simpler objectives, all that, with the underlying motivation to increase generalizability and robustness by focusing on the essentials. Supplementary to these techniques, we investigate how to promote simple behavior throughout the episode. To that end, we introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories. We propose a practical algorithm that optimizes all models, including policy and state representation, based on a lower-bound approximation. In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories, and that exhibit superior robustness to noise and changes in dynamics compared to baseline methods, while also improving performance in the original tasks.
Lay Summary: When training artificial intelligence (AI) to perform in the real world, we want them to perform as simply as possible while still being able to solve their tasks. This is because simpler behavior is easier to understand and predict, and is more likely to still work in situations the AI didn't encounter during training. However, it is difficult to increase the simplicity of behavior without a clear way to measure it.
In this work, we investigate a concept called total correlation to quantify the simplicity of an AI's behavior. Think of total correlation as measuring how many fewer bytes you would need to describe the behavior to someone else if your communication was optimized for describing the full behavior as a whole instead of the individual time steps. For example, a humanoid robot that uses a clean, periodic gait has higher total correlation than a slightly irregular gait that performs unnecessary adaptations to sensor noise.
We propose a method to learn behaviors with higher total correlations, and our tests on simulated robots yielded impressive results. Our method naturally generated highly predictable and compressible behaviors. For tasks that benefit from it, like a robot's gait, this often meant periodic movements (like a regular walking pattern). We observed these significant advantages, including superior robustness to unexpected noise and modeling errors (such as inaccurate mass during training), across various tasks, even those requiring complex, non-periodic movements. Crucially, these advantages came without sacrificing performance; in fact, our robots actually improved at their original tasks compared to standard methods.
This research offers a new perspective on training AI, suggesting that explicitly optimizing for simplicity in behavior can lead to more reliable, adaptable, and ultimately more trustworthy intelligent systems for real-world applications.
Primary Area: Reinforcement Learning
Keywords: Reinforcement Learning, Total Correlation
Submission Number: 10407
Loading