Abstract: To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. This relaxes the common assumption of purely low-rank transition models, allowing for more realistic scenarios where tasks share core dynamics but maintain individual variations. We introduce UCB-TQL (Upper Confidence Bound Transfer Q-Learning), designed for transfer RL scenarios where multiple tasks share core linear MDP dynamics but diverge along sparse dimensions. When applying UCB-TQL to a target task after training on a source task with sufficient trajectories, we achieve a regret bound of $\tilde{\mathcal{O}}(\sqrt{eH^5N})$ that scales independently of the ambient dimension. Here, $N$ represents the number of trajectories in the target task, while $e$ quantifies the sparse differences between tasks. This result demonstrates substantial improvement over single task RL by effectively leveraging their structural similarities. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations.
Lay Summary: When a computer learns a new task, it typically starts from scratch, requiring lots of time and data. Imagine if, instead, it could remember what it learned before and adapt quickly to new challenges, even when conditions change. Our work makes this possible in a specific type of artificial intelligence known as reinforcement learning, where machines learn through trial and error to make good decisions.
We designed a new learning method that allows computers to effectively transfer their experience from past tasks to solve new, related ones faster and more accurately. Our key idea was to separate what remains common across tasks from what changes, much like identifying common rules in different board games while noting specific rule differences.
By structuring the learning process in this way, our approach helps machines use their experience more wisely. This not only makes learning faster and smarter but also lays the groundwork for practical applications ranging from robots adapting to new environments to better decision-making systems in healthcare or business.
Primary Area: Theory->Learning Theory
Keywords: Transfer learning, Q learning, UCB Algorithms, Regret analysis, Low-rank plus sparse structure
Submission Number: 11338
Loading