Not All Structure Is Learned: Disentangling Inherited and Learned Representations in Recurrent Networks
Abstract: Structure observed in trained recurrent networks may be inherited from input encodings rather than learned from data. We develop and apply a three-step decomposition to disentangle the two: (1) compare trained representations against untrained baselines to isolate input-driven structure, (2) compare against information-theoretic bounds to quantify what is achievable without learning, and (3) use causal interventions to test whether inherited and learned components are functionally used. Applied to GRUs trained via behavioral cloning on aliased navigation in a 127-node binary tree, the most prominent hidden-state feature, a depth gradient on PC1, is already present before training: an untrained GRU captures 96% of the trained correlation, reflecting input structure rather than learned spatial knowledge. What training adds is within-class node discrimination via sequential memory. Replacing depth-stratified observations with random class assignments eliminates the inherited axis; the GRU compensates with 7× greater learned spatial discrimination while maintaining comparable performance. PCA ablation reveals a double dissociation confirming that both inherited and learned components are causally involved in behavior. Applied to a non-hierarchical radial arm maze, the framework recovers an analogous inherited axis but qualitatively different learned structure: visit history tracking rather than spatial disambiguation.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Zhirong_Wu1
Submission Number: 7874
Loading