Time and temporal abstraction in continual learning: tradeoffs, analogies and regret in an active measuring setting

Published: 01 Jan 2023, Last Modified: 16 May 2025CoLLAs 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This conceptual paper provides theoretical results linking notions in semi-supervised learning (SSL) and hierarchical reinforcement learning (HRL) in the context of lifelong learning. Specifically, our construction sets up a direct analogy between intermediate representations in SSL and temporal abstraction in RL, highlighting the important role of factorization in both types of hierarchy and the relevance of partial labeling, resp. partial observation. The construction centres around a simple class of Partially Observed Markov Decision Processes (POMDPs) where we show tools and results from SSL imply lower bounds on regret holding for any RL algorithm without access to temporal abstraction. While our lower bound is for a restricted class of RL problems, it applies to arbitrary RL algorithms in this setting. The setting moreover features so-called “active measuring”, an aspect of widespread relevance in industrial control, but - possibly due to its lifelong learning flavour - not yet well-studied in RL. Our formalization makes it possible to think about tradeoffs that apply for such control problems.
Loading