This repository contains the code for the paper "Extrapolation by Association: Length Generalization Transfer In Transformers" in submission to MOSS 2025.

To reproduce the results, please follow the instructions below.

## Section 4: Length Generalization Transfer in Algorithmic Tasks
Experiment suite for the main results: Arithmetic, String and Maze tasks.
``` bash
bash experiments/inheritance/run_scratch.sh
```

## Section 5: Length Generalization Transfer from Pretraining
Finetuning various checkpoints of SmolLM-360M on Arithmetic and Maze tasks.
``` bash
bash experiments/inheritance/run_scaling.sh
```

## Section 6: Ablations
### 6.1 Does length generalization transfer only happen for small models trained from scratch?
Reproducing main rersults with SmolLM-360M.
``` bash
bash experiments/inheritance/run_pretrained_2.sh
```
### 6.3 Varying Main and Auxiliary Task Lengths
Sweeping different lengths for the main and auxiliary tasks.
``` bash
bash experiments/inheritance/run_sweep.sh
```
### 6.4 Rotary Position Encoding Encourages Length Generalization Transfer
Rerunning the main experiments with NoPE. 
``` bash
bash experiments/inheritance/run_scratch_nope.sh
```

Further analysis and plotting code can be found in the `notebooks` folder. 
