# Supplementary Material for "Reinforcement Learning for Symbolic Graphics Code with Visual Feedback"

This is the **Supplementary Material** for the paper "Reinforcement Learning for Symbolic Graphics Code with Visual Feedback"

## Structure of `text2svg-eval`

```
text2svg-eval
│
├── data
│   └── text2svg_eval_n164.jsonl        # Text2SVG evaluation dataset
│
├── examples
│   ├── qwen2.5-coder-32b
│   │   └── description
│   │       ├── config.json
│   │       ├── config_judge.json
│   │       ├── generations.jsonl
│   │       ├── judged_svg.jsonl
│   │       ├── rendered_svg.jsonl
│   │       ├── render_status.json
│   │       └── summarized.json
│   ├── run_qwen2.5_coder_32b.sh        # Script to evaluate Qwen2.5-Coder-32B-Instruct
│   ├── run_qwen3_8b_nothink.sh
│   └── run_qwen3_8b_think.sh
│
├── gen.py                              # Step 1: Generate SVG code from text
├── render.py                           # Step 2: Render generated SVG code
├── judge_resize.py                     # Step 3: Evaluate using a Vision-Language Model (VLM)
├── summarize.py                        # Step 4: Aggregate evaluation scores
│
├── models
│   ├── base.py
│   ├── __init__.py
│   ├── models_api.py
│   └── models_local.py
│
├── prompts
│   └── judge.v4_with_golden.txt        # Prompt template for VLM-based evaluation
└── utils.py                            # Utility functions
```

## How to Run the Demo


```sh
cd text2svg-eval

#
# Before running, edit `examples/run_qwen2.5_coder_32b.sh` to set:
# - {PYTHON_BIN}: path to Python binary
# - {MODEL}: path to the model or model_name on Huggingface
# - {JUDGE_MODEL}: model used for evaluation
# - {TP} and {JUDGE_TP}: tensor parallelism settings based on your hardware
#

bash examples/run_qwen2.5_coder_32b.sh
```

The following intermediate files and results will be generated in sequence:

1. `config.json` – Configuration settings for generation  
2. `generations.jsonl` – Model-generated SVG code  
3. `rendered_svg.jsonl` + `render_status.json` – Base64-encoded rendered images and rendering status  
4. `config_judge.json` – Configuration settings for evaluation  
5. `judged_svg.jsonl` – Evaluation results from the VLM  
6. `summarized.json` – Aggregated performance metrics

## Other prompts

```
other_prompts
│
├── vl-caption-v1.txt
└── vl-categorization-v1.txt
```

As shown in **Figure 6** of the paper, we used two additional prompts to construct the text-to-SVG dataset:  
- `vl-categorization-v1.txt`: A prompt used to categorize raw data from The Stack V2.  
- `vl-caption-v1.txt`: A prompt used to generate captions for rendered images derived from SVG code, along with relevance ratings.

These prompts played a key role in curating and annotating the dataset.
