

<div align="center">

# SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

</div>

* Dataset: https://huggingface.co/datasets/anonymous20251119/data
* Model: https://huggingface.co/anonymous20251119/model


---



## 🎯 Introduction

We introduce **SWIRL**, a staged workflow for interleaved reinforcement learning designed for multi-agent systems. SWIRL reformulates multi-agent reinforcement learning (MARL) into a sequence of single-agent reinforcement learning tasks, updating one agent at a time while keeping the others fixed. This formulation enables stable training and promotes efficient coordination across agents. Theoretically, we provide stepwise safety bounds, a cross-round monotonic improvement theorem, and convergence guarantees on return, ensuring robust and principled optimization. In application to mobile GUI control, SWIRL instantiates a **Navigator** that converts language and screen context into structured plans, and an **Interactor** that grounds these plans into executable atomic actions. With this decoupled design and the stability of interleaved updates, our system attains state-of-the-art zero-shot performance on both high-level and low-level mobile GUI benchmarks using only $3{,}500$ training examples, outperforming baselines trained under diverse SFT and RL regimes. Beyond GUI tasks, SWIRL also demonstrates strong capability in multi-agent mathematical reasoning, underscoring its potential as a general framework for developing efficient and robust multi-agent systems.



<div align="center">
  <img src="assets/training_pipeline.jpg" alt="" style="width: 85%;">
</div>

#### Core Contributions

1. We introduce SWIRL, a multi-agent training framework that interleaves single-agent updates and transforms MARL to a sequence of single-agent RL problems; it achieves \(O(1)\) actor memory usage by loading only the currently updated agent, and accommodates heterogeneous model sizes, data schedules, and update budgets.
2. We establish formal guarantees, including a per step safety bound, a monotonic improvement result across rounds, and convergence of returns.
3. We instantiate SWIRL for mobile GUI control with a Navigator and an Interactor and, through extensive experiments, show stable training and state of the art zero-shot performance, together with ablations that clarify when interleaving helps.
4. We demonstrate transferability by applying the same training recipe to a non-GUI domain (e.g., mathematics) and observe consistent gains on standard benchmarks, indicating potential for broader multi-agent applications.




#### SWIRL-GUI

The *Navigator* translates instructions, history, and screenshots into structured low-level instructions (LLI), while the *Interactor* executes them as atomic actions (click, scroll, text input) with precise grounding. This hierarchical design enhances robustness, generalization, and interpretability.

<div align="center">
  <p><b>High-level Tasks</b></p>
  <img src="assets/res_high.png" alt="high-level task" width="85%">
  <p><b>Low-level Tasks</b></p>

| ![](assets/res_low.png) | ![](assets/res_low2.png) |
|--------------------------|--------------------------|
</div>

#### SWIRL-MATH

The *Teacher* provides a concise outline of the problem-solving approach, while the *Student* generates the final solution by following this guidance. This separation of roles enables structured reasoning and improves overall solution quality.

<div align="center">
  <img src="assets/res_math.png" alt="" style="width: 85%;">
</div>

## 🏁 Getting Started

### 📦 Installation

We recommend following the [official VeRL installation guide](https://verl.readthedocs.io/en/latest/start/install.html).
Below are the key package versions used in our setup:

```text
python == 3.10
CUDA == 12.4

accelerate == 1.7.0
deepspeed == 0.16.4
flash_attn == 2.7.4.post1
flashinfer-python == 0.2.5
ray == 2.46.0
sgl-kernel == 0.1.4
sglang == 0.4.6.post5
torch == 2.6.0
transformers == 4.51.1
vllm == 0.8.5.post1
xformers == 0.0.29.post2
xgrammar == 0.1.19
```


### 🚀 Quick Start

```shell
python demo/demo.py
```




## ⚙️ Detailed Documentation
Please refer to [this](examples/README.md).

