# ManiCoG:

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

> Multi-modal GUI grounding framework 

##  Quick Start

```bash
# 1. Setup environment
conda env create -f ManiCoG_env.yml
conda activate ManiCoG
pip install -e .

# 2. Configure your settings  
cp config.template.yaml config.yaml
# Edit config.yaml with your API keys and paths

# 3. Run experiment
./run_experiment.sh          # Baseline method
./run_reground_gpt.sh       # ReGrounding + GPT Judge method
```

##  Prerequisites

###  API Keys
- **OpenRouter**: [openrouter.ai](https://openrouter.ai)
- **OpenAI**: [platform.openai.com](https://platform.openai.com)

###  Models
- **TianXi Action Grounding 7B**
- Place at: `./models/TianXi_Action_Grounding_7B/`

###  Dataset
- **ScreenSpot-Pro**
- Place at: `./data/ScreenSpot-Pro/`

##  Project Structure

```
ManiCoG_paper/
├── src/                    # Source code
│   ├── utils/             # Core utilities  
│   └── lenovo_eval_ss_pro.py  # Main evaluation script
├── script_MPD/            # Model attention visualization tools
│   ├── MPD.py            # Multi-occlusion sampling visualization
│   └── run_MPD.sh        # MPD runner script
├── models/                # Model files (you download)
├── data/                  # Dataset files (you download)  
├── outputs/               # Experiment results
├── config.template.yaml   # Configuration template
├── run_experiment.sh      # Baseline runner 
├── run_reground_gpt.sh    # ReGrounding+GPT runner 
└── INSTALL.md            # Detailed installation guide
```

##  Usage Examples

### Basic Grounding (Baseline)
```bash
./run_experiment.sh
```

### Advanced ReGrounding + GPT Judge
```bash  
./run_reground_gpt.sh
```

### Resume Interrupted Experiments
```bash
# ReGrounding+GPT auto-saves progress and can resume
./run_reground_gpt.sh  # Will auto-resume if interrupted
```

### Custom Configuration
```bash
CONFIG_FILE=my_config.yaml ./run_experiment.sh
```

### Environment Variables
```bash
export CUDA_DEVICE=1
export OPENROUTER_API_KEY="your-key"
./run_experiment.sh
```

### Model Attention Visualization (MPD)
```bash
# Visualize model attention distribution on GUI
cd script_MPD/
bash run_MPD.sh
```
MPD tool generates scatter plots and heatmaps through multi-occlusion sampling to understand the attention mechanism and prediction behavior of vision-language models.

##  Results

After running experiments, find results in:
- `./outputs/{method}-{timestamp}.json` - Detailed metrics
- `./outputs/logs/{method}-{timestamp}.log` - Execution logs  
- `./outputs/{method}-{timestamp}/pipelines/` - Pipeline details
- `./outputs/{method}-{timestamp}.state.json` - Resume state (for interruptions)

##  Documentation

- **[INSTALL.md](INSTALL.md)** - Installation guide
- **[CONFIG.md](CONFIG.md)** - Configuration reference
- **[QUICKSTART.md](QUICKSTART.md)** - Quick start guide

##  License

This project is licensed under the MIT License - see [LICENSE](LICENSE) for details.

---