Learning without Global Backpropagation via Synergistic Information Distillation

ICLR 2026 Conference Submission24191 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deep learning, backpropagation-free learning, Local Learning, Scalable Deep Learning
TL;DR: We propose Synergistic Information Distillation (SID), a backprop-free training framework that resolves update locking and high memory costs by decomposing learning into a cascade of parallelizable, local belief refinement steps.
Abstract: Backpropagation (BP), while foundational to deep learning, imposes two critical scalability bottlenecks: update locking, where network modules remain idle until the entire backward pass completes, and high memory consumption due to storing activations for gradient computation. To address these limitations, we introduce Synergistic Information Distillation (SID), a novel training framework that reframes deep learning as a cascade of local cooperative refinement problems. In SID, a deep network is structured as a pipeline of modules, each imposed with a local objective to refine a probabilistic ``belief'' about the ground-truth target. This objective balances fidelity to the target with consistency to the belief from its preceding module. By decoupling the backward dependencies between modules, SID enables parallel training and hence eliminates update locking and drastically reduces memory requirements. Meanwhile, this design preserves the standard feed-forward inference pass, making SID a versatile drop-in replacement for BP. We provide a theoretical foundation, proving that SID guarantees monotonic performance improvement with network depth. Empirically, SID consistently matches or surpasses the classification accuracy of BP, exhibiting superior scalability and pronounced robustness to label noise. The code is publicly available at: https://anonymous.4open.science/r/sid_BDEF.
Primary Area: optimization
Submission Number: 24191
Loading