# JAXBench — Supplemental Material

This archive contains the code, prompts, and workflows used in the JAXBench paper.

## Contents

- **`JAXBench/`** — The benchmark suite itself: 50 JAX workloads (17 priority kernels
  extracted from production LLM architectures and 33 fused operators translated from
  KernelBench L2), reproducible XLA baselines, hand-tuned Pallas references for 8
  priority kernels, and the timing/profiling harness.
- **`autocomp/`** — The agent framework (Autocomp, public OSS) used to evaluate four
  classes of LLM-driven kernel optimization methods on JAXBench:
    1. Best-of-N sampling
    2. Iterative refinement
    3. Iterative refinement with Autocomp's TPU context
    4. Autocomp's full beam-search pipeline
  This directory contains the agent prompts, search algorithm, JAXBench evaluation
  backend, and TPU/JAX context artifacts (architecture summary, Pallas API reference,
  code examples, rules).

## Key Files

### Prompts and Agent Context
- `autocomp/autocomp/agents/` — Agent prompts and templates by hardware target.
- `autocomp/autocomp/agent_builder/.built/tpu-v6e/` — TPU v6e agent context artifacts
  (hardware summary, Pallas API reference, code examples, rules) used in the
  iterative+context and Autocomp methods.
- `autocomp/autocomp/agent_builder/.built/tpu-v5e/` — Equivalent v5e artifacts.

### Search and Evaluation
- `autocomp/autocomp/search/search.py` — Beam search algorithm.
- `autocomp/autocomp/search/run_search.py` — Entry point.
- `autocomp/autocomp/backend/jaxbench/` — JAXBench evaluation backend
  (`jaxbench_runner.py`, `jaxbench_eval.py`).
- `autocomp/autocomp/backend/tpu/` — TPU evaluation backend.

### Run Scripts
The `autocomp/run_*.sh` and `autocomp/run_batch.py` scripts at the top level of
`autocomp/` are the exact scripts used to reproduce the paper's experiments
(e.g. `run_50kernel_flash_baselines.sh`, `run_5kernel_pro_autocomp.sh`,
`run_5kernel_flash_iter_context.sh`).

### Baselines and Plotting
- `autocomp/autocomp/baselines/` — Best-of-N, iterative, and iterative+context
  baseline implementations, plus measurement and plotting utilities used to produce
  the paper's figures and tables.

## Anonymization Notes

- API keys (`autocomp/autocomp/common/keys.py`) have been blanked.
- User-specific absolute paths have been replaced with `/path/to/...` placeholders.
- The GCP project name has been replaced with `YOUR_GCP_PROJECT`.
- Stale temporary evaluation outputs and run logs (wandb, eval_outputs, tmp_files)
  have been pruned for size.
- Author names appearing inside the Autocomp framework have been left intact because
  Autocomp is already publicly released open-source software (citation in the paper).
  The JAXBench-specific code in this archive does not include author identifying
  information.

## Reproducing the Paper's Experiments

See `autocomp/README.md` for full setup instructions for the Autocomp framework
(LLM provider configuration, hardware target, etc.). For the JAXBench experiments
specifically:

1. Set up a TPU v6e (or v5e) instance and configure `AUTOCOMP_TPU_*` environment
   variables (see `autocomp/autocomp/backend/tpu/tpu_setup.md`).
2. Install JAXBench dependencies (`JAXBench/requirements.txt`).
3. Configure LLM API keys via environment variables (preferred) or
   `autocomp/autocomp/common/keys.py`.
4. Run one of the included `run_*.sh` scripts, e.g.

   ```bash
   cd autocomp
   ./run_50kernel_flash_baselines.sh   # 50-kernel Gemini Flash main suite
   ./run_5kernel_pro_autocomp.sh       # 5-kernel Gemini Pro Autocomp ablation
   ./run_5kernel_pro_iter_context.sh   # 5-kernel Gemini Pro iterative+context
   ```

The full sample budget per kernel is documented in the paper (Section 3) and matches
the defaults in the scripts.
