Required Python packages are listed in requirements.txt

The 5 partitions of the benchmark are saved in JSONL files in the `data` directory.

Scripts lin-comb.py, subst-poly.py, subst-hard.py were used used to generate
the three augmented datasets. They can be used as follows:

./subst-poly.py data/competition.jsonl > data/subst-poly-new.jsonl
./subst-hard.py data/competition.jsonl data/competition.jsonl >
data/subst-hard-new.jsonl
./lin-comb.py data/competition.jsonl data/competition.jsonl > data/lin-comb-new.jsonl

The symbolic and numeric checks are implemented in `evaluate.py`. Using
a sample of answers from Gemini-2.5 Flash, it can be used as follows:

./evaluate.py data/gemini-example-decodings.jsonl \
    >  data/gemini-example-decodings.evaluated.jsonl

The pass rate can be then computed as:

./pass-rate.py data/gemini-example-decodings.evaluated.jsonl
