Track: tiny paper (up to 4 pages)
Keywords: learning dynamics, topology
TL;DR: We prove that permutation-equivariant learning rules (e.g. SGD, Adam) preserve the topological structure of neurons at small learning rates, but break it and simplify models at large learning rates.
Abstract: We prove that for a broad class of permutation-equivariant learning rules (including SGD, Adam, and others), the training process induces a bi-Lipschitz mapping of neurons and preserves key topological properties of the neuron distribution. This result reveals a qualitative difference between small and large learning rates. Below a critical topological threshold $\eta^*$, the training is constrained to preserve the topological structure of the neurons, whereas above $\eta^*$ the process allows topological simplification, making the neuron manifold progressively coarser and reducing the model's expressivity. An important feature of our theory is that it's independent of specific architectures or loss functions, enabling universal applications of topological methods to the study of deep learning.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 92
Loading