# MMR Gym

MMR Gym generates *synthetic, puzzle-style multimodal reasoning* datasets.

The project revolves around three building blocks:

- **Motifs** — visual primitives (e.g., `rings`, `polygon`, `stripes`).
- **Tilings** — geometric cell layouts (e.g., `square`, `triangular`, `hexagonal`).
- **Tasks** — puzzle families (e.g., *Sequence / Arithmetic*, *Symmetry*, *Missing Patch*).

Each task picks a motif, applies rules across multiple cells, builds distractor
options, and returns a composed image plus metadata. Rules describe how motif
instances vary (e.g., "count increases by 1 each step"). The engine samples
motifs and tasks from registries using weighted selection.

## Installation

Install dependencies and the package in editable mode:

```bash
pip install -r requirements.txt
```

## Quick start

Generate a small dataset:

```bash
python -m mmr_gym.engine --n 5 --out ./out --seed 123
```

Images and JSON metadata are written under `./out/`.

## Running tests

Execute the regression tests with `pytest`:

```bash
pytest
```

## Code structure

- `mmr_gym/motifs/` – motif implementations.
- `mmr_gym/tilings/` – tiling implementations.
- `mmr_gym/tasks/` – task implementations.
- `mmr_gym/utils/` – drawing helpers and shared utilities.
- `images/` – sample outputs and assets.
- `demo/` – quick scripts to preview motifs, tilings, and tasks.
- `tests/` – regression tests.

## Available tasks

### Charts
- `charts_match_proportions`: Match the color-to-percentage mapping between a top chart and cross-type options.
- `charts_pie`: Answer sorting questions on a pie chart with distinct slice percentages.

### Counting
- `shape_count`: Count how many motif instances appear in a single connected figure.

### Geometric reasoning
- `geometric_position`: Count small shapes relative to non-overlapping larger regions.
- `geometric_sort`: Sort geometric configurations rendered with random-pack layouts.
- `geometric_stack_count`: Count pieces strictly inside a selected sheet within stacked arrangements.
- `rect_venn`: Solve rectangle or circle Venn-style inclusion questions with connected unions.

### Sequences
- `sequence_arithmetic`: Apply arithmetic progressions to motif counts across panels.
- `sequence_multi_column_arithmetic`: Track per-column arithmetic progressions and extend the sequence.
- `sequence_rotation`: Identify the missing panel in a constant-step rotation sequence.

### Symmetry
- `symmetry_frieze_groups`: Spot the frieze strip that follows a different symmetry rule.
- `symmetry_grid_mirror_fill`: Fill the missing tile in a 2x2 grid so the pattern obeys a target mirror symmetry.
- `symmetry_scene_mirror_identify`: Name the line of symmetry, including the none option, for a rendered scene.
- `symmetry_wallpaper_groups`: Identify the wallpaper patch whose symmetry group differs from the others.

### Tiles
- `tiles_connected_component`: Answer component size or count questions inside colored tilings.
- `tiles_decompose_compose`: Match decompositions and recompositions of connected tile regions.
- `tiles_geometry`: Compute area, perimeter, or related measures on colored tile regions.
- `tiles_line_intersections`: Count shared vertices among colored paths drawn on a tiling.
- `tiles_line_length`: Measure the edge-step length of a highlighted path on a tiling.
- `tiles_missing_tiles`: Restore missing colors or shapes in a tiling by choosing the correct option.
- `tiles_shortest_path`: Find the minimum edge-step path between marked tiles while avoiding obstacles.
- `tiles_recoloring`: Count per-cell color differences between two related tilings.

### Transforms
- `transform_pair_infer`: Identify the single transform that maps a source tile to a target.
- `transform_result_identify`: Pick the image that results from applying a sampled transform.
- `transform_similarity_identify`: Decide which option is uniquely similar or dissimilar to a reference motif.

## Registries (plug-in system)

Motifs, tilings, and tasks self-register via decorators. Importing the module makes
them available to the engine.

```python
from mmr_gym.base import Motif, Tiling, Task
from mmr_gym.registry import register_motif, register_tiling, register_task

@register_motif
class MyMotif(Motif):
    ...

@register_tiling
class MyTiling(Tiling):
    ...

@register_task
class MyTask(Task):
    ...
```

`build_motif_registry()`, `build_tiling_registry()` and `build_task_registry()`
collect everything that has been imported.

## Anatomy of a Motif

A motif renders one cell. Required methods:

- `sample_spec(rng)` – draw random parameters.
- `clamp_spec(spec)` – enforce valid ranges.
- `render(spec)` – return an RGBA image.

Motifs expose `attr_ranges` for adjustable fields and reuse constants from
`mmr_gym.config`. See existing modules in `mmr_gym/motifs` for reference.

## Anatomy of a Tiling

A tiling generates a patch of polygonal cells. Required methods:

- `sample_spec(rng)` – draw random parameters.
- `clamp_spec(spec)` – enforce valid ranges.
- `generate(spec)` – return geometry (vertices, edges, cells).
- `render(spec)` – fill cells and return an image.

See `mmr_gym/tilings` for examples and coloring strategies.

## Anatomy of a Task

Tasks control how a motif varies across cells and construct options.
`generate_instance` should return:

```python
composite_image, cell_payloads, metadata
```

Use existing files under `mmr_gym/tasks` as templates.

## Per-task motif allow-list (MOTIF_WEIGHTS)

```python
MOTIF_WEIGHTS = {
    "rings": 1.0,
    "polygon": 0.7,
    "stripes": 0.3,
}
```

### Tips for weighting motifs

1. Review available motifs in `config.py`.
2. Start with the most intuitive motifs for the task.
3. Assign higher weights to motifs you want to appear more often; set weight `0`
   to disable a motif.
4. Iterate by generating samples and adjusting weights.

## Extending

To add new motifs, tilings, or tasks, subclass `Motif`, `Tiling`, or `Task`,
decorate with `@register_motif`, `@register_tiling`, or `@register_task`, and
place the file under `mmr_gym/motifs`, `mmr_gym/tilings`, or `mmr_gym/tasks`.
Existing modules provide working examples.

## Visual uniqueness & quality knobs

- `OPT_UNIQUENESS_MIN` — min pixel-change fraction between options.
- `IMG_DIFF_MIN` — minimal change for progression checks.
- `MAX_BUILD_RETRIES` — how many attempts before giving up.
- `SS_CELL`, `OUT_CELL` — internal & output tile sizes.
- `SEQ_CONFIG["limits"]["count_lo_hi"]` — sequence multiplicity bounds.

## Troubleshooting

- **"images do not match" in `alpha_composite`** – ensure every tile is RGBA and
  exactly `OUT_CELL × OUT_CELL`.
- **"failed to build a verifiable sample after N attempts"** – distractors
  weren't distinct enough. Loosen thresholds or widen limits.

## FAQ

**How do I allow or ban a motif for a task?**
Edit that task’s `MOTIF_WEIGHTS`: set a positive float to allow; remove or set
`0.0` to disable.

**How do I weight tasks themselves?**
Task selection happens in the engine. Add `TASK_WEIGHTS` in your driver and use
`random.choices`.

**My motif has count=(1,1) — will the sequence task still work?**
Yes. It will use the repeat path and vary multiplicity globally.


## Data Split creation
1. Create 5000 eval samples: python -m mmr_gym.engine --n 5000 --out dataset/val --seed 11
2. Create 100000 train samples: python -m mmr_gym.engine --n 100000 --out dataset/train --workers 6 --seed 42 
3. Create eval/train split following the notebook

## Contributing

- Add motifs in `mmr_gym/motifs/…` and tasks in `mmr_gym/tasks/…`.
- Keep per-task allow-lists small while stabilizing; expand later.
- Prefer simple rules; add harder distractors once stable.

Happy puzzle-making! 🧩

