# K²-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control

## 🚀 Overview

**K²-Agent** is a hierarchical framework for mobile device control, built upon the cognitive distinction between two fundamental types of human knowledge. The first, **declarative knowledge ("*knowing what*")**, is explicit and can be rapidly articulated and summarized from a few demonstrations into a task plan. The second, **procedural knowledge ("*knowing how*")**, consists of implicit skills acquired through repeated practice to form 'muscle memory' for precise execution. K²-Agent's core design explicitly separates and co-evolves these two capabilities to master complex mobile tasks.

Inspired by this cognitive model, K²-Agent is built on a hierarchical framework where the two knowledge systems follow different update rules and co-evolve:

* **High-Level Planner (Know-What, Training-Free):** A powerful VLM (`Qwen2.5-VL-72B`) that maintains a declarative knowledge base. It uses a **Summarize–Reflect–Locate–Revise (SRLR)** loop, bootstrapped from a single demonstration, to iteratively improve its task-level plans based on execution feedback.
* **Low-Level Executor (Know-How, Learning-Based):** We post-train an efficient `Qwen2.5-VL-7B` model using our **Curriculum-guided Group Relative Policy Optimization (GRPO)** algorithm to master a library of precise and generalizable procedural skills.

These two modules co-evolve, creating a powerful closed-loop system where better planning provides better data for skill learning, and improved skills offer more reliable feedback for planning.

---
## 🎯 Key Features

### Hierarchical Dual-Knowledge Architecture
- **Declarative Knowledge System (Know-What)**: Training-free high-level planner using powerful VLM for semantic understanding and task planning
- **Procedural Knowledge System (Know-How)**: Learning-based low-level executor optimized for precise UI interaction and action execution
- **Co-Evolution Mechanism**: Closed-loop system where better planning improves skill learning data, and enhanced skills provide more reliable planning feedback

### Advanced Learning Algorithms
- **SRLR Loop (Summarize-Reflect-Locate-Revise)**: Iterative improvement of declarative knowledge from execution feedback
- **C-GRPO (Curriculum-guided Group Relative Policy Optimization)**: Novel algorithm for procedural skill acquisition
- **Single Demonstration Bootstrap**: Rapid task adaptation from minimal examples


### Performance & Reliability
- **High Success Rate**: 76.7% success rate on AndroidWorld benchmark (89/116 tasks)
- **Real-Time Feedback**: Advanced action visualization and logging
- **Adaptive Learning**: Continuous improvement through execution experience
- **Robust Error Handling**: Graceful failure recovery and retry mechanisms

---
## 🏗️ Project Structure

```
K2-agent/
├── agents/                    # AI Agent implementations
│   ├── base_agent.py         # Abstract base class for all agents
│   ├── K2_agent.py           # Main K2-Agent implementation with dual-model architecture
│   ├── m3a.py                # Multimodal Autonomous Agent
│   ├── seeact.py             # Vision-based agent (OpenAI)
│   ├── t3a.py                # Text-only Autonomous Agent
│   └── infer.py              # LLM inference utilities
├── env/                      # Environment control and device interaction
├── task_evals/               # Task definitions and evaluation logic
├── training/                 # Model training code and configurations
│   └── src/open-r1-multimodal/  # Training pipeline for low-level executor
│       ├── configs/          # Training configurations
│       ├── data_config/      # Dataset configurations
│       ├── run_scripts/      # Training execution scripts
│       └── src/open_r1/      # Core training modules
├── utils/                    # Utility functions and helpers
└── requirements.txt          # Python package dependencies
```

The project is organized as follows:
- `K2_agent.py`: Core implementation of K2-Agent, including both the high-level reasoning model and low-level function model.
- `env/`: Implementation of the Android environment interface, handling device control and state management.
- `task_evals/`: Task definitions and automated evaluation logic for various Android operations.
- `utils/`: Directory containing utility functions for file operations, datetime management, and fuzzy matching.

## Quick Start
### Dependencies

First, create a [conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) environment and install all pip package requirements.

```bash
conda create -n k2agent python==3.11
conda activate k2agent
pip install -r requirements.txt
```

### Environment Setup

To set up the Android environment for the K²-Agent  to interact with, refer to [the AndroidWorld installation guidance](https://github.com/google-research/android_world). 


### Model Configuration

K²-Agent uses two complementary models that work together in a hierarchical architecture:

#### High-Level Planner (Know-What)
- **Model**: Qwen2.5-VL-72B-Instruct (API mode)
- **Purpose**: Task planning and instruction generation using SRLR loop
- **Configuration**: Set your API key in environment variable or code:
```bash
export DASHSCOPE_API_KEY="your_api_key_here"
```

#### Low-Level Executor (Know-How)
- **Model**: Qwen2.5-VL-7B-Instruct (locally trained)
- **Purpose**: Precise UI interaction and action execution
- **Training**: Use the provided training code with C-GRPO algorithm
- **Configuration**: Specify local model path in initialization:

```python
# Initialize K2-Agent
agent = DualModelAgent(
    env=env,
    reason_model_path="qwen2.5-vl-72b-instruct",       # API model name
    function_model_path="/path/to/trained/local/model", # Local trained model path
    analyze_mode=True,                                  # Enable demonstration analysis
    task_name_for_analysis="ContactsAddContact",        # Task for analysis
    enable_iterative_improvement=True                   # Enable iterative learning
)
```

### Training the Low-Level Executor

To train the low-level executor model using the provided GRPO algorithm:

```bash
cd training/src/open-r1-multimodal
bash run_scripts/run_grpo_gui.sh
```

This will train the Qwen2.5-VL-7B-Instruct model using the Group Relative Policy Optimization algorithm on Android interaction data.

### Running Experiments

#### Testing K²-Agent
After training the low-level model and configuring the API key for the high-level planner, you can test the complete system:

```python
from android_world.agents.K2_agent import DualModelAgent

# Initialize K2-Agent with trained models
agent = DualModelAgent(
    env=env,
    reason_model_path="qwen2.5-vl-72b-instruct",       # API model
    function_model_path="/path/to/your/trained/model",  # Trained local model
    analyze_mode=True,
    enable_iterative_improvement=True
)

# Run a task
result = agent.step("Create a new contact for John Doe with number 555-1234")
```

The system will:
1. Use the high-level planner to generate task instructions via SRLR loop
2. Execute precise actions using the trained low-level executor
3. Iteratively improve the knowledge base based on execution feedback
4. Log reasoning processes and save action visualizations


## License

All content of this work is under Apache License v2.0, including codebase, data, and model checkpoints.

## Acknowledgments

We would like to express our sincere gratitude to the following open-source projects and communities:

- [Android-Env](https://github.com/google-deepmind/android_env) for providing the Android environment framework
- [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) for the powerful vision-language model
- [Android World](https://github.com/google-research/android-world) for the comprehensive task evaluation framework
- [VLM-R1](https://github.com/om-ai-lab/VLM-R1) for GRPO training pipeline

We also thank all contributors and the open-source community for their continuous support and inspiration.
