Task structure and nonlinearity jointly determine learned representational geometry

Published: 16 Jan 2024, Last Modified: 05 Mar 2024ICLR 2024 posterEveryoneRevisionsBibTeX
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: representational geometry, kernel target alignment, disentanglement, activation function, out-of-distribution generalization
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We show that the geometry of learned representations can be understood in terms of input and target geometries, and that tanh networks, more than relu networks, tend to align with output geometry
Abstract: The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and on the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric saturation of ReLU, which leads feature neurons to specialize for different regions of input space. Feature neurons in Tanh networks, by contrast, tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Primary Area: visualization or interpretation of learned representations
Submission Number: 8246