Data-Driven Knowledge Transfer in Batch Q* Learning

Published: 05 Jan 2026, Last Modified: 28 Jan 2026Journal of the American Statistical AssociationEveryoneCC BY 4.0
Abstract: In data-driven decision-making across marketing, healthcare, and education, leveraging large datasets from existing ventures is crucial for navigating high-dimensional feature spaces and addressing data scarcity in new ventures. We investigate knowledge transfer in dynamic decision-making by focusing on batch stationary environments and formally defining task discrepancies through the framework of Markov decision processes (MDPs). We propose the Transfer Fitted Q-Iteration algorithm with general function approximation, which enables direct estimation of the optimal action-state function 𝑄* using both target and source data. Under sieve approximation, we establish the relationship between statistical performance and the MDP task discrepancy, highlighting the influence of source and target sample sizes and task discrepancy on the effectiveness of knowledge transfer. Our theoretical and empirical results demonstrate that the final learning error of the function is significantly reduced compared to the single-task learning rate.
Loading