A Theoretical Perspective on Streaming Noisy Data with Distribution Shift

Luo Wenshui, Shuo Chen, Tao Zhou, Chen Gong

Published: 10 Dec 2025, Last Modified: 25 Mar 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Intelligent systems typically need to continually learn from streaming data subject to distribution shift, where a key requirement is that they cannot catastrophically forget the historical knowledge learned from previous data. More seriously, streaming data often contain substantial label noise, which can exacerbate catastrophic forgetting and lead to performance degradation on forthcoming data. To address these problems, Continual Noisy Label Learning (CNLL) has been proposed. However, existing CNLL methods still fall short of the ability in addressing catastrophic forgetting because they adopted heuristic strategies in handling label noise and did not explicitly characterize the distributional shift across time, which hinders effective knowledge transfer from historical data to new data. To tackle these challenges, we theoretically analyze the problem of learning from streaming noisy data with distribution shift and propose a unified framework called Continual Noisy Label Learning on Drifting Data Streams (CNLDD). Specifically, we theoretically explore, for the first time, the upper bound of cumulative generalization error for CNLL problem, which reveals three factors leading to forgetting, namely selection bias of buffered data, distribution shift, and label noise. To alleviate the selection bias of buffered data, we design a two-step buffer update strategy to narrow the distribution gap between the original historical data and the selected representative data in buffer. To address distribution shift, our CNLDD explicitly characterizes the distribution discrepancies between buffered data and incoming data, prioritizing historical data with minimal discrepancies to enhance knowledge transfer. To tackle noisy labels, CNLDD estimates the importance weight of each example with the instance-dependent noise transition matrix, thereby avoiding the data bias and knowledge forgetting arising from noisy labels. Empirically, due to the unified modeling of the aforementioned issues, our CNLDD achieves superior classification performance when compared with state-of-the-art CNLL methods on both synthetic and real-world datasets.