# Training nanoGPT on Nested Functions (ListOps)

This repository contains code for training nanoGPT models on generalized ListOps data, which includes generic nested functions. The project extends the original ListOps dataset to explore how transformer models handle hierarchical and compositional structures.

## Getting started

Install [`uv`](https://docs.astral.sh/uv/getting-started/installation/)

```sh
uv sync
```

## Overview

ListOps is a synthetic dataset designed to test the ability of neural networks to learn hierarchical composition. This implementation generalizes the dataset to include arbitrary nested functions, allowing for more complex compositional reasoning tests.

## Features

- Extended ListOps dataset with customizable nested functions
- Implementation based on Andrej Karpathy's nanoGPT architecture
- Training and evaluation scripts for hierarchical reasoning tasks
- Tools for generating and analyzing compositional data

## Getting Started

### Installation

```bash
uv sync
```

Note: This project uses `uv` for dependency management. Dependencies are defined in `pyproject.toml` and will be automatically installed when you run `uv sync`.

### Usage
**IMPORTANT:** in `src/listops/model/config.py` update the `entity: str = Field(default="YOUR_WANDB_USERNAME")` to be your wandb username, or provide it during runs. 

1. Generate the generalized ListOps dataset:
```bash
python generate_data.py
# or 
python generate_data.py --num_train 50000 --max_depth 2 --funcs_to_use add max median --mod 10 --save_dir ../data/
```

2. Train a model:
```bash
python train_single.py
# or 
python src/train_single.py --data_file '../data/listops/List*(add_10,max*.pkl' --model 'GPT' --n_embed 128 --n_layer 4 --learning_rate 5e-4 --min_lr 5e-6 --alpha 1.0
```

3. Run sweep:
```bash
python run_sweep_wandb.py --sweep ./sweeps/ListOps-energy-good.py --num_agents 5 
```

<!-- 3. Evaluate model performance:
```bash
python evaluate.py
``` -->



## Dataset

The generalized ListOps dataset extends the original by including:
- Arbitrary operators beyond MAX, MIN, MEDIAN, etc. 
- Variable nesting depths
- Customizable function vocabularies

## Model Architecture

Based on nanoGPT, our implementation uses a transformer decoder architecture adapted for processing hierarchical structures.

## Results

TBA

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- Original ListOps dataset creators
- Andrej Karpathy for the nanoGPT implementation 
