Is Temporal-Difference Learning the Only Path to Stitching in RL?

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep Reinforcement Learning, Goal-Conditioned Reinforcement Learning, Stitching
TL;DR: While temporal-difference (TD) learning is often associated with the ability to stitch experiences, we show that Monte Carlo reinforcement learning methods can also combine short experiences into long-horizon behavior.
Abstract: Reinforcement learning (RL) promises to solve long-horizon tasks even when training data contains only short fragments of the behaviors. This experience stitching capability is often viewed as the purview of temporal difference (TD) methods. However, outside of small tabular settings, trajectories never intersect, calling into question this conventional wisdom. While it is widely held that Monte Carlo (MC) methods are incapable of recombining experience, might function approximation result in a form of implicit stitching, insofar as it is a simpler model for the data? The goal of this paper is to empirically study whether the conventional wisdom about stitching actually holds in settings where function approximation is used. We empirically demonstrate that Monte Carlo methods do often perform experience stitching. While TD methods do achieve slightly stronger capabilities than MC methods (in line with conventional wisdom), this gap narrows as we use larger neural networks. Furthermore, we find that increasing critic capacity effectively reduces the generalization gap for both the MC and TD methods. These results suggest that the TD learning's inductive bias for stitching may be less necessary in the era of large RL models and, in some cases, may offer diminishing returns. Additionally, our results suggest that stitching, a form of generalization unique to the RL setting, might be achieved not through specialized algorithms (temporal difference learning) but rather through the same recipe that has provided generalization in other machine learning settings: scaling model size. Project website: https://anonymous.4open.science/r/golden-standard-4C84
Submission Number: 22
Loading