# Softmax Transformers are Turing-Complete

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)

This repository contains the official implementation and experimental code for the paper **"Softmax Transformers are Turing-Complete"**, demonstrating the theoretical and empirical computational universality of transformer architectures through systematic evaluation on algorithmic tasks.

## Repository Structure

```
├── binary/                 # Binary encoding experiments
│   ├── main.py            # Training script with β-RPE
│   ├── gen_sigma.py       # Dataset generation utilities  
│   ├── llm.py             # Model architectures & β-RPE implementation
│   ├── dataloder.py       # Data loading and preprocessing
│   ├── kcm_binary.py      # Kleene Closure Machine for binary tasks
│   └── dataset/           # Generated datasets
├── unary/                 # Unary encoding experiments (position-free)
│   ├── main.py            # Zero positional encoding training
│   ├── dataloder.py       # Unary data preprocessing
│   ├── kcm.py             # Unary Kleene Closure Machines
│   └── dataset/           # Unary representation datasets
├── logs/                  # Training logs and loss curves
└── checkpoints/           # Saved model checkpoints
```

## Quick Start

### Prerequisites

```bash
pip3 install torch torchvision transformers matplotlib tqdm sympy
```

### Training Models

**Binary Arithmetic with β-RPE:**
```bash
cd binary
python  main.py --model mul   --epochs 100   --batch_size 64   --nembd 64 --nlayer 6 --nhead 4  --pos_mode beta   --beta_scale 1   --lr 1e-4 --save_name mul_bin
```

**Binary Arithmetic without β-RPE:**
```bash
cd binary
python  main.py --model mul   --epochs 100   --batch_size 64   --nembd 64 --nlayer 6 --nhead 4  --pos_mode zero     --lr 1e-4 --save_name mul_bin
```

**Distributed Training (4 GPUs):**
```bash
torchrun --nproc_per_node=4 main.py --model mul   --epochs 100   --batch_size 64   --nembd 64 --nlayer 1 --nhead 2  --pos_mode beta   --beta_scale 1  --lr 1e-4 --save_name mul_bin
```

**Unary Position-Free Learning:**
```bash
cd unary  
python main.py --model mul --model_type llama --epochs 100
```

### Dataset Generation

```bash
cd binary
python gen_sigma.py --max_num 50000 --task mul
```

```bash
cd unary
python gen_dataset.py --max_num 50000 --task mul
```


### 2. Progressive Validation Strategy

Three-tier validation for robust generalization assessment:
- **Val0**: Moderate difficulty (overlaps with training)
- **Val1**: Hard examples (progressive difficulty)  
- **Val2**: Very hard examples (extreme generalization test)

### 3. Sigma-Encoded Input Format

Mathematical operations encoded as comma-separated sequences:
```
Input:  1,0,1,/,1,1,0,/,?     # 5 ÷ 6 = ?
Output: 12_4_01/_qI0_*-qI_1...  # KCM binary encoding of result
```
