everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
Instead of training deep neural networks offline with a large static dataset, continual learning (CL) considers a new learning paradigm, which continually trains the deep networks from a non-stationary data stream on the fly. Despite the recent progress, continual learning remains an open challenge. Many CL techniques still require offline training of large batches of data chunks (i.e., tasks) over multiple epochs. Conventional wisdom holds that online continual learning, which assumes single-pass data, is strictly harder than offline continual learning, due to the combined challenges of catastrophic forgetting and underfitting within a single training epoch. Here, we challenge this assumption by empirically demonstrating that online CL can match or exceed the performance of its offline counterpart given equivalent memory and computational resources. This finding is further verified across different CL approaches and benchmarks. To better understand these counterintuitive experimental findings, we design a framework to unify and interpolate between online and offline CL and provide a theoretical analysis showing that online CL can yield a tighter generalization bound than offline CL.