# WAREX Supplementary Code

## Set up mitmproxy
1. Download mitmproxy [here](https://mitmproxy.org/downloads/).
In our experiments, we use mitmproxy version [10.4.1](https://mitmproxy.org/downloads/#10.4.1/).
Since we use a Linux sandbox environment, we specifically download `mitmproxy-10.4.1-linux-x86_64.tar.gz`.
2. Extract the package by running `tar -xzf mitmproxy-10.4.1-linux-x86_64.tar.gz`.
This will create the binary executables: `mitmproxy, mitmdump, mitmweb`. 
4. Run `./mitmproxy` for the first time. This will create a `.mitmproxy` folder containing the necessary certificates to intercept HTTPS requests.
5. Run `scripts/install-certificate.sh` to install and update the certificate on your system.
6. Now, for testing, open two terminal instances. In the first terminal, run `./mitmproxy`. In the second terminal, set your proxy environment variables `export http_proxy=http://127.0.0.1:8080`, `export https_proxy=http://127.0.0.1:8080`, then run `curl https://www.google.com`. You should now see the request appear in the **first terminal** inside mitmproxy’s interactive environment. From there, you can inspect the full request and response details.

Note: The ip tables solution which is more low-level and can be more efficient for an agent developer and is located under the scripts folder is based off of code from the [urldump repo](https://github.com/lemonsqueeze/urldump)

## Set up WebArena following the instructions [here](https://github.com/web-arena-x/webarena) and the environments themselves using the Docker instructions [here](https://github.com/web-arena-x/webarena/blob/main/environment_docker/README.md)

## Set up SteP agent on WebArena following instructions [here](https://github.com/asappresearch/webagents-step)

## Set up WebVoyager following instructions [here](https://github.com/MinorJerry/WebVoyager)

## Set up REAL following instructions [here](https://github.com/agi-inc/agisdk)

## Run mitmproxy with addon.py script for **reliability** testing
Here are the settings for `config.json`:
```
addon:
 0: no addon 
 1: popup 
 2: server error
 3: network error
 4: random addon (1-3)
 5: js 504 error
 6: malicious popup
 7: random addon (1-6)
 

frequency:
   0: Intercept once (original behavior)
   n: Intercept n times on random pages as explained in Section 3.3 Failure Frequencies
```

1. You may choose which addon type you'd like to test your agent with, along with the frequency of its appearance. Then run `./mitmproxy -s injection_script.py`, which uses the settings defined in `config.json`. This will allow you to intercept when there is a new request made, and modify the response to introduce unreliable scenarios to test the desired agent against. This will also allow you to record efficiency metrics (prompt tokens, completion tokens, latency) for each LLM call. 

Note: `setup_scripts` folder contains an example script to follow if using the third discussed implementation method: iptables. It also contains an important script for establishing certificate trust on Linux. 
