Keywords: learning dynamics, topology
TL;DR: We prove that permutation-equivariant learning rules (e.g. SGD, Adam) preserve the topological structure of neurons at small learning rates, but break it and simplify models at large learning rates.
Abstract: We prove that for a broad class of permutation-equivariant learning rules (including SGD, Adam, and others), the training process induces a bi-Lipschitz mapping of neurons and preserves key topological properties of the neuron distribution. This result reveals a qualitative difference between small and large learning rates. Below a critical topological threshold $\eta^\*$, the training is constrained to preserve the topological structure of the neurons, whereas above $\eta^\*$ the process allows topological simplification, making the neuron manifold progressively coarser and reducing the model's expressivity. An important feature of our theory is that it's independent of specific architectures or loss functions, enabling universal applications of topological methods to the study of deep learning.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 8046
Loading