Are Greedy Task Orderings Better Than Random in Continual Linear Regression?

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: continual learning, lifelong learning, task ordering, curriculum learning, forgetting, kaczmarz
TL;DR: We study greedy task orderings in continual learning that maximize dissimilarity between consecutive tasks, and compare their performance to random orderings both analytically and empirically.
Abstract: We analyze task orderings in continual learning for linear regression, assuming joint realizability of training data. We focus on orderings that greedily maximize dissimilarity between consecutive tasks, a concept briefly explored in prior work but still surrounded by open questions. Using tools from the Kaczmarz method literature, we formalize such orderings and develop geometric and algebraic intuitions around them. Empirically, we demonstrate that greedy orderings converge faster than random ones in terms of the average loss across tasks, both for linear regression with random data and for linear probing on CIFAR-100 classification tasks. Analytically, in a high-rank regression setting, we prove a loss bound for greedy orderings analogous to that of random ones. However, under general rank, we establish a repetition-dependent separation. Specifically, while prior work showed that for random orderings, with or without replacement, the average loss after $k$ iterations is bounded by $\\mathcal{O}(1/\\sqrt{k})$—we prove that single-pass greedy orderings may fail catastrophically, whereas those allowing repetition converge at rate $\\mathcal{O}(1/\\sqrt[3]{k})$. Overall, we reveal nuances within and between greedy and random orderings.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 17430
Loading