Keywords: Linear Networks, Lazy Regime, Active Regime, Training Dynamics, Phase Diagram
TL;DR: We derive a simple formula for training dynamics of linear networks that not only unifies the lazy and balanced dynamics, but reveals the existence of mixed dynamics.
Abstract: The training dynamics of linear networks are well studied in two distinct
setups: the lazy regime and balanced/active regime, depending on the
initialization and width of the network. We provide a surprisingly
simple unifying formula for the evolution of the learned matrix that
contains as special cases both lazy and balanced regimes but also
a mixed regime in between the two. In the mixed regime, a part of
the network is lazy while the other is balanced. More precisely the
network is lazy along singular values that are below a certain threshold
and balanced along those that are above the same threshold. At initialization,
all singular values are lazy, allowing for the network to align itself
with the task, so that later in time, when some of the singular value
cross the threshold and become active they will converge rapidly (convergence
in the balanced regime is notoriously difficult in the absence of
alignment). The mixed regime is the `best of both worlds': it converges
from any random initialization (in contrast to balanced dynamics which
require special initialization), and has a low rank bias (absent in
the lazy dynamics). This allows us to prove an almost complete phase
diagram of training behavior as a function of the variance at initialization
and the width, for a MSE training task.
Primary Area: Learning theory
Submission Number: 20427
Loading