# WebGuiAgents
This file contrain instructions to run experiments o the (V)WebArena benchmarks, separated as below:

1) Python Setup
2) Setting API keys
3) Hosting the websites
4) Running experiments with `run.py`, `scripts/runs/run.sh`, and `scripts/runs/p_run.sh`.

Original instructions can be found at: [vwebarena](https://github.com/web-arena-x/visualwebarena), [webarena](https://github.com/web-arena-x/webarena).

# 1) Python Setup
1. Create a virtual environmnet using your prefered method.

    ```shell
    # Use python 3.10 or 3.11
    # Python venv:
    python3.11 -m venv webguiagents
    source webguiagents/bin/activate
    
    # Conda:
    conda create --name webguiagents python=3.11  
    conda activate webguiagents
    ```

2. Install required dependencies:
    ```shell
    pip install -r requirements.txt
    playwright install
    python -c "import nltk; nltk.download('punkt')"
    ```

3. (Optional): Install FlashAttention for faster inference with HuggingFace models.
    ```bash
    pip install flash-attn
    ```

4. To use the base captioner in (V)WA, run `scripts/utils/update_blip2-flan.py` to update the model weights to new version of BLIP2.

# 2) Setting API keys
- To use Google models, see [here](https://ai.google.dev/gemini-api/docs/get-started/python) for details to create API keys.
- To use OpenAI models, see [here](https://openai.com/index/openai-api/) for details to create API keys.
- To use gated HuggingFace models (such as Meta's original release for Llama-3), you have to set a HF user access token. See [here](https://huggingface.co/docs/hub/en/security-tokens) for instructions.


- **RECOMMENDED**: After creating API keys, create a `./scripts/set_api_keys.sh` script as below to more easily set credentials while running experiments. 
    - With this, you can simply add `source scripts/set_api_keys.sh` in experiment scripts or before running `python run.py`
    - This file is explictly included in `.gitignore` to not risk publishing the keys to the web.
    ```bash
    # Example `./scripts/set_api_keys.sh`
    export OPENAI_API_KEY=<your-key>
    export GOOGLE_API_KEY=<your-key>
    export HF_TOKEN=<your-key>
    ```

- Alternatively: 
    - Export then manually in the terminal with the same commands as above 
    - OR: Add lines such as `os.environ['OPENAI_API_KEY'] = <your-openai-api-key>` for each env variable into python scripts (NOT RECOMMENDED)

# 3) Hosting the websites
1) ~~Using the websites hosted by the authors.~~
This option is not available anymore.

2) Host the websites locally. 
    - It is possible to set up single, isolated, websites or the full enviroment (all websites + homepage with tools and links to them).
    - Follow the additional steps [here](environment_docker/README.md) before continuing.

# 4) Running experiments

Before running experiments, first define the following: 
- (i) The configuration for the Agent. An example is in `config/agent_config_base.yaml`.
- (ii) The configuration for the MLLMs used by the Agent. An example is in `config/lm_configs.yaml`.
- (iii) If you are working with a new MLLM, make sure it is in the `llms/config/model_repo.yaml` file. 
    - This file is just meant to standardize model paths among providers, automate some parameter definitions (e.g.: 'provider') and store useful information about models (e.g: default temperature, etc)

**NOTE: Only proceed after hosting the websites as explained in (3).**

## 4.1) Start / Reset environments - local hosting only
Reset the environments before a full evaluation to keep them consistent. Use the same script used to host the websites locally:

 ```bash
 scripts/environments/start_reset_envs.sh all_vwa # to reset all environments in VisualWebArena
 scripts/environments/start_reset_envs.sh shopping classifieds # etc ... to reset specific specific environments
 ```

## 4.2) Captioner
The environments for (V)WA includes a model to caption images given as part of the request and/or on the Webpage. The typical evaluations use this feature. 

The base captioner is a BLIP2 model. The original VWA codebase hosts the model on the GPU/CPU. However, if you are running multiple experiments, this means each experiment would be using a share of the GPU/CPU which is suboptimal.

For this reason, I recommend using the script `utils/host_captioner.py`. Run:
```
tmux
python utils/host_captioner.py
```
This will host the captioner like a web server to receive requests from all running processes.

Finally, pass to the evaluation script `--eval_captioning_model_device`, `--agent_captioning_model_device` as `server-cuda` (see below)

**OBS**: if you run the scripts with `server-cuda` option, it will automatically host the captioner if it is not found. But I still suggest explictly hosting it with the script above before running experiments.

## 4.3) Simple run using run.py
#TODO: update

**NOTE, RECOMMENDED:** run the experiments using the script `./scripts/runs/run.sh`. It consolidates the steps for setting environment variables, generating cookies, and contains a range of options to tweak the experiments with comments.


## 4.4) Using the run.sh script
The script `.scripts/runs/run.sh` contains an extensive set of options to run experiments on (V)WebArena. 

Before running, two user-specific changes are required:
1) Point to the `scripts/set_api_keys.sh` exporting your Google and OpenAI API keys, as suggested in (2).
    - Not recommdend: you can manually add lines `export OPENAI_API_KEY=<openai-api-key>` and `export GOOGLE_API_KEY=<openai-api-key>`. 
    However, note the run script is NOT in `.gitgnore` since it is often updated, so take care to not commit the keys to github.
2) Change the `MODE` variable according to how you are hosting the websites as instructed in (3). See the script for options.
    - This instructs `scripts/utils/set_env_variables.sh` to export the URLs given the mode chosen in (3) to host the websites. 
    - For local deployment, it will fetch the URLs from container/flask servers running locally. For the author's hosted websites, it will use their provided urls.

## 4.5) Tasks and task subsets
- The json files located in `config_files` contains all the tasks comprising each benchmark
- The tasks in `config_files/vwa_not_vague` are the same as in VWA, except for adjustments on task intents that don't specificy clearly what the Agent needs to do.
    - See `experiments_utils/update_task_intents.py` for the adjustments made.

- Within `evaluation_harness/task_subsets` there are lists of tasks for each domain in the benchmark. These subsets give a good signal of what would be expected if evaluating on the full benchmark. Use them for faster experimentation.

## 4.6) Parallel execution
The script `scripts/p_run.sh` executes multiple instances of the `run.sh` script each with their group of tasks.

Example run:
```
./scripts/runs/p_run.sh -c config_files/vwa_not_vague/test_classifieds -t evaluation_harness/task_subsets/classifieds.txt -n 5
```

This will execute 5 instances of `run.sh`, distributing the tasks evenly from the list `evaluation_harness/task_subsets/classifieds.txt` using the configurations in `config_files/vwa_not_vague/test_classifieds`.

# 5) Agents and MLLM configurations
- Check the files `config/agent_config_base.yaml` and `config/lm_configs.yaml` for an example of how to configure the modules of the Agent and/or the parameters of the MLLMs it utilizes.

- In the `agent.yaml` file, add a `gen_config_alias` point to the set of parameters this Agent will use. Create the corresponding entry in the `config/lm_configs.yaml` with the desired parameters.

- For example, suppose `agent.yaml` looks like this:

```yaml
executor_agent:
  action_set_tag: som
  prompt: p_som_cot_id_actree_3s_prev_utterances
  num_previous_state_actions: 100
  out_utterance: True
  max_model_call: 3
  
  # Model specific configs
  lm_config:
    gen_config_alias: base

critique_agent:
  prompt: p_critique
  num_previous_state_actions: 100
  out_utterance: True
  max_model_call: 3
  max_critique_executor_loop: 3
  mode: two_pass    # one_pass, two_pass
  
  # Model specific configs
  lm_config:
    gen_config_alias: low_random
```

Then you should have an entry `base` and `low_random` **for each of the models** you are using in the `lm_config`. Say we are using `gpt-4o-2024-08-06` for the executor and `llava-next-1.6` for the critique. 
The `lm_config.yaml` should have at least these two entries:


```yaml
gpt-4o-2024-08-06
  base:
    temperature: 1.0
    top_p: 0.9
    max_tokens: 768
    name_user: 'user'
    name_assistant: 'assistant'
    img_detail: 'auto'
    text_first: True

llava-next-1.6
base:
    temperature: 1.0
    top_k: 0.9
    max_tokens: 768
```
Obs.: If parameters are not included, defaults for the corresponding model will be used. Also, don't worry of including parameters the models may not support (like `top_k` for GPT models). The `llms` utilities will handle that and remove them.


# 6) Other / Optional (Under construction)

## 6.1) Text Generation Interface - Local Setup  (Under construction)
1) Install Rust if dont have
```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

2) install Protoc if dont have
```bash
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
```

3) Clone TGI repository
```bash
git clone https://github.com/huggingface/text-generation-inference.git
cd text-generation-inference/
BUILD_EXTENSIONS=False make install
```

If complaint contains  that ```The system library `openssl` required by crate `openssl-sys` was not found.```, install: 

```bash
sudo apt-get install libssl-dev
```

4) Running TGI locally. See `start_tgi.sh` to deploy TGI locally.


## 6.2) Text-Generatio-Interface - Local Setup with Docker
#TODO