# Softmax Transformers are Turing-Complete

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)

This repository contains the official implementation and experimental code for the paper **"Softmax Transformers are Turing-Complete"**, demonstrating the theoretical and empirical computational universality of transformer architectures through systematic evaluation on algorithmic tasks.

## Repository Structure

```
├── binary/                 # Binary encoding experiments
│   ├── main.py            # Training script with β-RPE
│   ├── gen_kcm_dataset.py       # Dataset generation utilities  
│   ├── rpe_llm.py             # Model architectures & β-RPE implementation
│   ├── dataloder.py       # Data loading and preprocessing
│   ├── kcm_binary.py      # Kleene Closure Machine for binary tasks
│   └── dataset/           # Generated datasets
├── unary/                 # Unary encoding experiments (position-free)
│   ├── main.py            # Zero positional encoding training
│   ├── dataloder.py       # Unary data preprocessing
│   ├── kcm.py             # Unary Kleene Closure Machines
│   └── dataset/           # Unary representation datasets
├── logs/                  # Training logs and loss curves
└── checkpoints/           # Saved model checkpoints
```

## Quick Start

### Prerequisites

```bash
pip3 install torch torchvision transformers matplotlib tqdm sympy
```

### Training Models

**Binary Arithmetic with RPE:**
```bash
cd binary
python  main.py --model mul   --epochs 100   --batch_size 64   --nembd 64 --nlayer 1 --nhead 1  --pos_mode beta      --lr 1e-4  
```

**Binary Arithmetic without RPE:**
```bash
cd binary
python  main.py --model mul   --epochs 100   --batch_size 64   --nembd 64 --nlayer 1 --nhead 1  --pos_mode zero      --lr 1e-4  
```

**Distributed Training (4 GPUs):**
```bash
torchrun --nproc_per_node=4 main.py --model mul   --epochs 100   --batch_size 64   --nembd 64 --nlayer 1 --nhead 1  --pos_mode zero      --lr 1e-4  
```

**Unary Position-Free Learning:**
```bash
cd unary  
python main.py --model parity --model_type llama --epochs 100
```

### Dataset Generation

```bash
cd binary
python3 gen_kcm_dataset.py --language mul --outdir dataset/mul
```

```bash
cd unary
python gen_dataset.py --max_num 10000 --task mul
```


### 2. Progressive Validation Strategy

Three-tier validation for robust generalization assessment:
- **Val0**: Moderate difficulty (overlaps with training)
- **Val1**: Hard examples (progressive difficulty)  
- **Val2**: Very hard examples (extreme generalization test)
 