# PhysicsMinions

The source code for the ICLR-2026 submission, titled **"PhysicsMinions: Winning Gold Medals in the Latest Physics Olympiads with a Coevolutionary Multimodal Multi-Agent System"**.

![Physics overview](intro/PhysicsMinions-overview.png)

## 📖 Introduction

Physics is central to understanding the real world, and problem-solving ability is a key measure of physics intelligence. Physics Olympiads, the crown of competitive physics, demand complex reasoning and deep multimodal understanding, yet remain largely underexplored in AI. Current single-model approaches—especially open-source MLLMs—rarely reach gold-medal performance.

We introduce **PhysicsMinions**, a coevolutionary multi-agent system for Physics Olympiads with three studios:

* **Visual Studio**: extracts structured visual information  
* **Logic Studio**: generates and refines solutions  
* **Review Studio**: conducts dual-stage verification  

On the **7 latest physics Olympiads** from the **HiPhO benchmark**, PhysicsMinions achieves:  
1. **Consistent gains:** improving both open- and closed-source models beyond their single baselines;  
2. **Historic milestones:** raising open-source results to 6 golds, including the first-ever open-source IPhO gold under average-score;  
3. **Human-level scaling:** reaching 26.8/30 on IPhO Pass@32, ranking 4th of 406 contestants, far above the best single-model score (22.7/30, ranked 22nd).  

![Physics framework](intro/PhysicsMinions-framework.png)

---

## 🛠️ Environment Setup

### 1️⃣ Install VLMEvalKit

Refer to the official documentation: [VLMEvalKit Installation and Setup](https://opendeep.wiki/open-compass/VLMEvalKit/installation-and-setup)

Recommended source installation:

```bash
# Clone the repository
git clone https://github.com/open-compass/VLMEvalKit.git
cd VLMEvalKit 

# Create virtual environment
conda create -n physicsminions python=3.10.18
conda activate physicsminions

# Install dependencies
pip install -r requirements.txt 

# Local dev install (for debugging)
pip install -e . 
```

> **Note**: After cloning VLMEvalKit, please manually copy `eval/physics_r1.py` into the `VLMEvalKit/vlmeval/dataset/` directory.

---

### 2️⃣ Install Other Dependencies

```bash
pip install pylatexenc math_verify 
```

---

### 3️⃣ Environment Configuration

Create a `.env` file under the `eval` directory and add the following:

```bash
# OpenAI API config
OPENAI_API_KEY=your_api_key
OPENAI_API_BASE=your_url 
```

Then copy the `.env` file into the VLMEvalKit folder.

> **Note**: Replace `your_api_key` with your actual API key.

---

## 📂 File Structure

```bash
project-root/
├── code/
├── eval/
├── VLMEvalKit/
└── data/
```

- **code/**: Contains the source code of the PhysicsMinions system, including `main.py` (the main entry point) and `prompt.py`

- **eval/**: Includes the evaluation code specifically for the physics Olympiads

- **VLMEvalKit/**: The official repository for evaluating MLLMs (Multimodal Large Language Models). It is recommended to set this up by cloning the official Git repository

- **data/**: Stores the 7 latest Physics Olympiads used for evaluation, sourced from the publicly available HiPhO benchmark


---

## 🚀 Inference
PhysicsMinions starts working! With just a very short piece of code, it will automatically complete all the problems in `problem_path` and save the results to `logs`. Here, `cv` stands for "consecutive verification", which is explained in detail in the text, and its default value is 2.  


```bash
cd code
```

```bash
python main.py \
    --model_name gemini-2.5-flash-thinking \
    --problem_path ../data \
    -l ../logs/gemini-2.5-flash-thinking \
    -cv 2
```

> **Note**:
> The script supports checkpointing (if `APhO_2025_1_A_1.log` exists, it resumes).
> However, the final solution is saved in `final_solution_APhO_2025_1_A_1.txt`.
> If a crash occurs before generating the solution, delete `APhO_2025_1_A_1.log` and restart.

---

## 📊 Evaluation

```bash
cd eval
```

### ⚡ 0. One-click Evaluation

Recommended: directly run the pipeline. (Steps 1–3 explain details)

```bash
bash pipeline.sh
```

---

### 📝 1. Add Inference Results to Metadata

Inference results are stored in `logs/` by default. Run:

```bash
python process_predictions.py \
  --log-dir ../logs/model_name/run_01 \ 
  --output ipho_2025_with_predictions.json \
  --output-dir ../results/model_name/run_01 \
  --json-file ../data/IPhO_2025.json
```

---

### 📦 2. Add Box Format

```bash
python add_boxed_answers.py \
  --input ../results/model_name/run_01/ipho_2025_with_predictions.json \
  --output-dir ../results/model_name/run_01 \
  --output-file ipho_2025_with_predictions_boxed.json
```

---

### 🧪 3. Run VLMEval Evaluation

The JSON file must follow the naming rule in `eval/universal_physics_evaluator.py` → `UniversalPhysicsEvaluator.DATASET_CONFIGS`:
**`{dataset_name}_with_predictions_boxed.json`**

Run final evaluation:

```bash
python eval_physics.py \
  --dataset ipho_2025 \ 
  --judge-model gemini-2.5-flash \ 
  --nproc 8 \ 
  --results-dir ../results/model_name/run_01 \
  --output-dir ../results/model_name/run_01 
```

---

### 📈 4. View Evaluation Results & Stats

```bash
# Score file (focus on total_score)
ipho_2025_score.json

# Detailed results: ground truth, model answers, scoring points
ipho_2025_detailed_results.json 

# Log file: traceable coarse & fine-grained scoring
ipho_2025_detailed_logs.txt 
```

* `total_score`: final score (the metric used in our experiments)
* `coarse_grained_total_score`: answer-level coarse-grained score
* `fine_grained_total_score`: step-level fine-grained score
