# REFLEX-RLVR

**Self-Teacher Capacity Expansion via Latent-Register Exploration and
Latent-to-Discrete Policy Transfer (LDPT).** A NeurIPS 2026 Main
Conference project.

> *A base LLM is a stronger discriminator of valid reasoning chains
> than a generator of them. REFLEX-RLVR is the recipe that converts
> that discriminator into a generator on previously-unsolvable
> problems, using nothing but a formal verifier and the model itself.*

## Quick links

- **Project overview** — `overview/`. One-stop reference: title,
  abstract, TL;DR, keywords, contributions, methods, status, planned
  figures, benchmarks. Update these as the project evolves.
- **Canonical paper** — `paper/paper.md`. Single file (no versioned
  drafts). Currently has Introduction + Methods only; Experiments /
  Results / Discussion fill in as data lands. PDF builds go to
  `paper/builds/`.
- **Tabular results** — `results/`. CSV / NPZ / Parquet only; figures
  are *not* stored here.
- **Figures** — `figures/`. Scripts that render from `results/`. Empty
  as of 2026-05-03.
- **Conceptual proposal** — `proposal.md` (~46K words, v0.D3, post 3
  audit rounds). The pre-registered scientific protocol.
- **Engineering specification** — `architecture.md` (~25K words). The
  per-cycle hyperparameter source of truth.
- **Spend log** — `spend_log.md`. Running tally of every Modal
  invocation, dollar cost, and what was learned.

## Repository layout

```
reflex-rlvr/
├── README.md                ← you are here
├── CLAUDE.md                ← session bootstrap rules for Claude Code
├── proposal.md              ← scientific proposal (pre-registered)
├── architecture.md          ← engineering specification
├── spend_log.md             ← running Modal spend
├── pyproject.toml           ← uv / pip / pytest config
├── overview/                ← one-stop project-identity files
├── paper/
│   ├── paper.md             ← CANONICAL manuscript
│   ├── README.md
│   ├── submission_checklist.md
│   ├── archive/
│   └── builds/              ← PDF outputs (gitignored)
├── results/                 ← tabular only; figures NOT stored here
├── figures/                 ← rendering scripts; output gitignored
├── configs/
│   ├── pilot.yaml
│   ├── cycle_1..5.yaml
│   └── modal.yaml
├── src/reflex_rlvr/
│   ├── verifier/            ← SymPy + sandboxed code + Lean stub + router
│   ├── latent/              ← cosine-anneal noise, halt head, diagnostics
│   ├── gsi/                 ← Gradient-Spectral Initialization
│   ├── eval/                ← pass@k, paired bootstrap, Yue crossover check
│   ├── mining/              ← (stub; Modal-bound)
│   ├── translator/          ← (stub; Modal-bound)
│   └── modal_app/           ← Modal app glue (smoketest only for now)
├── scripts/                 ← entry-point scripts (modal_smoketest, …)
├── tests/                   ← unit tests (Mac-CPU runnable)
└── data/                    ← datasets (gitignored)
```

## Local development quickstart

```bash
# Install (Python 3.11, uses uv if available)
uv sync                               # or: pip install -e ".[dev]"

# Run the unit tests (no GPU, no Modal)
pytest

# Run the SymPy + sandbox + dispatcher tests only
pytest tests/test_sympy_verifier.py tests/test_code_verifier.py tests/test_router.py
```

## First Modal job (when ready, gated on user approval)

```bash
modal run scripts/modal_smoketest.py     # CPU only; ≈ $0.001
```

The smoketest verifies image build, secrets, and volumes are wired.
Per the [Modal cost rule](#modal-cost-rule) below, every subsequent
GPU job requires explicit "yes run it" approval.

## Project status (2026-05-03)

Phase 0 — pre-Modal local infrastructure. Verifier sandbox, latent
primitives, GSI, pass@k eval, Modal smoketest scaffold are
implemented and unit-tested on Mac CPU. Modal authenticated
(`anonymous`), no GPU jobs launched. The first GPU job will be
the Week-1 premise pilot on Qwen2.5-1.5B (proposal §1.7), ≤ $250
envelope, two pre-registered gates fire before any 7B compute.

## Modal cost rule

- Show the exact `modal run` command and the expected $ cost.
- Wait for explicit approval ("yes run it" or equivalent).
- Never use `--detach`.
- Never auto-retry on disconnect.
- Log every H100·hr to a wandb run.

(Mirrors the project's user memory; do not weaken these rules.)

## License

Apache-2.0 (planned, post-paper-acceptance). Until then, the repo is
private; please do not redistribute.
