ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Tao Feng; Wei Li; Didi Zhu; Hangjie Yuan; Wendi Zheng; Dan Zhang; Jie Tang

ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Backpropagation provides a generalized configuration for overcoming catastrophic forgetting. Optimizers such as SGD and Adam are commonly used for weight updates in continual learning and continual pre-training. However, access to gradient information is not always feasible in practice due to black-box APIs, hardware constraints, or non-differentiable systems, a challenge we refer to as the gradient bans. To bridge this gap, we introduce ZeroFlow, the first benchmark designed to evaluate gradient-free optimization algorithms for overcoming forgetting. ZeroFlow examines a suite of forward pass-based methods across various algorithms, forgetting scenarios, and datasets. Our results show that forward passes alone can be sufficient to mitigate forgetting. We uncover novel optimization principles that highlight the potential of forward pass-based methods in mitigating forgetting, managing task conflicts, and reducing memory demands. Additionally, we propose new enhancements that further improve forgetting resistance using only forward passes. This work provides essential tools and insights to advance the development of forward-pass-based methods for continual learning.

Lay Summary: Modern AI systems often struggle to retain what they've learned when they take on new tasks—a problem known as "catastrophic forgetting." Traditionally, researchers rely on a technique called backpropagation, which uses information about how a model's predictions go wrong (gradients) to update its knowledge and reduce forgetting. But in many real-world situations, getting access to these gradients isn’t possible, like when using black-box systems, dealing with hardware limits, or working with non-trainable components.To address this, we introduce ZeroFlow, the first benchmark for studying how gradient-free methods—using only the model’s outputs—can reduce forgetting. Our results show that these simple forward-pass methods can be surprisingly effective. Our experiments show that surprisingly, even without gradients, AI models can still hold onto old knowledge effectively. We also discover new techniques that help models avoid conflicts between tasks and save memory, all by using forward outputs only. This research opens up new possibilities for building smarter, more flexible AI systems that keep learning, even without access to gradients.

Link To Code: https://zeroflow-bench.github.io

Primary Area: Deep Learning

Keywords: Catastrophic Forgetting, Continual Learning, Incremental Learning

Submission Number: 213

Loading