# Which Formal Languages Can Large Language Models Learn In Context?
_Learning formal languages via in-context learning (ICL)._

## Overview
This project asks whether large language models (LLMs) can learn formal languages through ICL. We evaluate synthetic string-classification tasks across:
- Regular (R)
- Deterministic context-free (DCF)
- Context-free (CF)
- Context-sensitive (CS)

Models (e.g., GPT-4o, DeepSeek V3) infer the target language from examples only—no parameter updates. The codebase supports reproducible runs, multiple encoding strategies, and automatic accuracy evaluation.

**Note:** Inference primarily uses external APIs (GPT-4o, DeepSeek V3), so no GPU is required for those runs—CPU-only machines are sufficient.  
**Cluster option:** We also support **vLLM** on a compute cluster for local inference with open models, including **Qwen2.5 32B**, **Qwen2.5 7B**, **Llama-3.1 8B**, and **Llama-3.1-70B-Instruct**.

## Project Structure
- `experiments/` — Main experiment scripts and result aggregation
- `scripts/` — Helpers for API calls
- `src/` — Prompt construction, encoding strategies, tokenizers, utilities
- `prompts/` — Prompt templates
- `data/` — Subsampled benchmark

## Running Experiments
Environment management uses [Poetry](https://python-poetry.org/).

### 1) Install dependencies
```bash
poetry install
```

### 2) Run experiments
Run the workflow scripts under `experiments/openai` and `experiments/deepseek`. For example:
```bash
poetry run bash experiments/deepseek/run_workflow.sh --model deepseek-chat
```

### 3) Aggregate results
Produce a CSV of results:
```bash
poetry run bash experiments/aggregate_results.sh
```

## Cluster Experiments
For **vLLM**-based cluster runs (e.g., **Qwen2.5 32B**, **Qwen2.5 7B**, **Llama-3.1 8B**, **Llama-3.1-70B-Instruct**), see the `cluster/` directory for launch scripts and configuration.
