# INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic

## Anonymized Supplementary Artifact

This artifact contains the frozen v1 benchmark datasets, evaluation results, and analysis code
sufficient to reproduce the tables and figures in the paper.

## Contents

```
artifact_anonymous/
├── paper/                      # LaTeX source and auto-generated content
│   ├── icml_paper/            # ICML paper source
│   │   ├── main_icml_induction.tex
│   │   ├── mybib.bib
│   │   └── icml2026.{sty,bst}
│   ├── appendix_model_examples.tex
│   └── auto/                  # Auto-generated tables and figures
│       ├── tables/
│       ├── appendix/
│       ├── figures/
│       └── figs/
├── code/                      # Analysis and evaluation code
│   └── concept_synth/
│       └── analysis/          # Table/figure generation scripts
├── data/
│   ├── benchmarks/            # Frozen v1 benchmark YAML files
│   │   ├── ad_benchmark_v1.yaml   (FullObs: 375 problems)
│   │   ├── c_benchmark_v1.yaml    (CI: 200 problems)
│   │   └── e_benchmark_v1b.yaml   (EC: 200 problems)
│   ├── results/               # Cached evaluation records
│   │   ├── fo_eval_records.jsonl
│   │   ├── ci_eval_records.jsonl
│   │   ├── ec_eval_records.jsonl
│   │   ├── fo_holdout.json
│   │   ├── ci_holdout.json
│   │   └── ec_best_completion.json
│   └── analysis/              # Analysis outputs
├── scripts/
│   ├── build_icml_paper.sh    # Build PDF from LaTeX
│   ├── reproduce_paper.sh     # Regenerate tables/figures
│   └── verify_benchmarks.sh   # Verify data integrity
└── LICENSE.txt
```

## Quick Start

### 1. Verify Data Integrity

```bash
cd artifact_anonymous
chmod +x scripts/*.sh
./scripts/verify_benchmarks.sh
```

### 2. Reproduce Tables and Figures

```bash
# Requires: Python 3, pandas, matplotlib
./scripts/reproduce_paper.sh
```

### 3. Build the Paper PDF

```bash
# Requires: latexmk or pdflatex
./scripts/build_icml_paper.sh
# Output: paper/icml_paper/main_icml_induction.pdf
```

## Benchmark Summary

| Task    | Problems | Description |
|---------|----------|-------------|
| FullObs | 375      | Full observation across multiple worlds |
| CI      | 200      | Contrastive YES/NO worlds |
| EC      | 200      | Existential completion with partial observation |

## Model Results (v1 Snapshot)

Results are from 8 models evaluated with identical prompts:
- Grok4, GPT-5.2, Grok4.1f, Gemini 3, DSR, Opus 4.5, Hermes4, GPT-4o

See Table 1 in the paper for the summary across all tasks.

## License

This artifact is released for academic review purposes.
If accepted, full code and generation pipeline will be released publicly.
