Keywords: Continual learning, parameter isolation, mixture of experts, LoRA, mask routing
TL;DR: Functional Task Networks route each task through a sparse, support-batch-recovered subset of independent components, enabling task-ID-free parameter isolation with a LoRA variant that sharply reduces forgetting under concept shift.
Abstract: Continual adaptation is hardest under concept shift, where successive tasks share an input distribution but require incompatible outputs. We introduce Functional Task Networks (FTN), a parameter-isolation method that routes each task through a sparse subset of independent components without requiring a task ID at inference. Given a small labeled support batch, FTN constructs a binary mask by optimizing mask logits, applying spatial smoothing on a component grid, and selecting a fixed-capacity set of active components. The same cold-start procedure is used to allocate components during training and to recover them at evaluation, so task inference is handled by the adaptation mechanism itself. Disjoint masks yield disjoint gradient paths, giving a structural no-forgetting guarantee under standard optimizer precautions. We instantiate the same idea in LoRA by replacing a single low-rank adapter with a mask-routed pool of rank-r components. On synthetic concept-shift benchmarks, FTN substantially reduces forgetting relative to shared-parameter baselines while preserving task performance. On a six-task distilgpt2 style-transformation stream, LoRA-FTN reduces forgetting from +7.37 to +0.07 nats under support-batch mask recovery, approaching an oracle disjoint-adapter reference without using task labels.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 37
Loading