# README
- Compiled instructions to deploy VisualWebArena (VWebArena) and WebArena benchmarks **locally** on RIPL/personal machines.
- To deploy on AWS, see the original instructions, accessible here: [vwebarena](https://github.com/web-arena-x/visualwebarena/tree/main/environment_docker), [webarena](https://github.com/web-arena-x/webarena/blob/main/environment_docker/README.md#environment-reset). 
- Relative to the original, this document consolidates instructions for both benchmarks, contains RIPL-specific instructions, and are based off scripts to integrate and reduce some of the original manual steps and paths, reduce hardcoding, and allow more flexibility in ports/urls definitions. 

Below list instructions to:
1) Download the necessary files to deploy the websites locally.
2) Load images into Docker for the websites that requires it.
3) Initiate the websites.
4) (Optional) Host the environments on the web [TODO]

- If the images are already downloaded and loaded into Docker, go directly to 3.
    - To check which images are loaded, execute `docker images`  in the terminal

**Notes on Docker**:
- Many commands in Docker by default requires `sudo` access. Ask the administrator to allow running them without sudo.
- [Cheat Sheet](https://docs.docker.com/get-started/docker_cheatsheet.pdf) for Docker terminal commands.
- [VSCode Docker extension](https://code.visualstudio.com/docs/containers/overview) provide some convenient features.

# 1) Downloading necessary files for local deployment

The sections below contain the links to download the files necessary to deploy each website.

**Notes**:
- Files for each website are provided in different formats:
    - `.tar`: These are compressed Docker images, which are loaded into the Docker image cache via the `docker load` command. THESE REQUIRE LARGE STORAGE. See note below.
    - `.zim`: Can run Docker directly on top of it; no need to load to Docker image cache.
    - Docker-Compose: The image is loaded into the Docker image cache directly from Docker's remote repository.
    - Online: the `maps` website is hosted by the authors online. Just access the website via their provided URL; no setup needed.

- Each benchmark requires the following set of websites for full deployment\*:
    - **VWebArena**: [Shopping, Classifieds, Reddit, Wikipedia, Homepage]
    - **WebArena**: [Shopping, Shopping Admin, GitLab Reddit, Wikipedia, Homepage]
    - Obs: Running websites in isolation is possible and useful for testing, however:
        - (i)  Tasks that require navigating to more than one website will be impossible to complete.
        - (ii) The agent will not be able to use tools (as in the homepage) or knowledge source (Wikipedia). It is recommended to change the prompts accordingly.
        
**IMPORTANT NOTE ON STORAGE REQUIREMENTS:**

- The Docker images are LARGE, occupying 9GB ~ 89GB. 

- Moreover, for the images in `.tar` format, during the Docker load it is **required ~double the storage**, because Docker create its image cache from the `.tar` - i.e.,  storage for holding the `.tar` and for the Docker cache. 
    - It may be possible to `wget` directly into Docker's Image cache. However, note the downloads take quite some time and are prone to network errors.

- Therefore, **if using RIPL machines**, make sure:
    1) To Download the files below to a disk with high storage. 
        - Use `df -h` to list the disks and the directories they are mounted in
    2) That Docker's default storage path is set to a disk with high storage. 
        - By default, Docker stores all its data in `/var/lib/docker`. If the directory is not associated to a high storage disk, it is recommended to change it, or else the machine can run out of space.
        - **Changing this requires sudo access**. And there is no option to change where Docker loads the image cache for a single instance.

##Links to download the environments:
```
wget http://metis.lti.cs.cmu.edu/webarena-images/shopping_final_0712.tar; wget http://metis.lti.cs.cmu.edu/webarena-images/postmill-populated-exposed-withimg.tar; wget http://metis.lti.cs.cmu.edu/webarena-images/wikipedia_en_all_maxi_2022-05.zim
```
### Shopping (~63GB) [VWebArena, WebArena]
- https://drive.google.com/file/d/1gxXalk9O0p9eu1YkIJcmZta1nvvyAJpA/view?usp=sharing
- https://archive.org/download/webarena-env-shopping-image
- http://metis.lti.cs.cmu.edu/webarena-images/shopping_final_0712.tar

### Shopping Admin (~9GB) [WebArena]
- https://drive.google.com/file/d/1See0ZhJRw0WTTL9y8hFlgaduwPZ_nGfd/view?usp=sharing
- https://archive.org/download/webarena-env-shopping-admin-image
- http://metis.lti.cs.cmu.edu/webarena-images/shopping_admin_final_0719.tar

### Reddit (~50GB) [VWebArena, WebArena]
- https://drive.google.com/file/d/17Qpp1iu_mPqzgO_73Z9BnFjHrzmX9DGf/view?usp=sharing
- https://archive.org/download/postmill-populated-exposed-withimg
- http://metis.lti.cs.cmu.edu/webarena-images/postmill-populated-exposed-withimg.tar

### GitLab (~72GB) [WebArena]
- https://drive.google.com/file/d/19W8qM0DPyRvWCLyQe0qtnCWAHGruolMR/view?usp=sharing
- https://archive.org/download/webarena-env-gitlab-image
- http://metis.lti.cs.cmu.edu/webarena-images/gitlab-populated-final-port8023.tar

### Wikipedia (~89GB) [WebArena]
- https://drive.google.com/file/d/1Um4QLxi_bGv5bP6kt83Ke0lNjuV9Tm0P/view?usp=sharing
- https://archive.org/download/webarena-env-wiki-image
- http://metis.lti.cs.cmu.edu/webarena-images/wikipedia_en_all_maxi_2022-05.zim

### Classifieds (~81MB) [VWebArena]
(i) Files to compose the containers
- Obs: Files already in the repo, in `./environment_docker/classifieds_docker_compose`
- https://drive.google.com/file/d/1m79lp84yXfqdTBHr6IS7_1KkL4sDSemR/view
- https://archive.org/download/classifieds_docker_compose

(ii) Downwload Docker image from remote Docker registry.
Run:
```
docker pull jykoh/classifieds:latest
docker pull mysql:8.1
```
This will load the image directly into Docker. No need to double storage as the `.tar` case.

### Map [WebArena]
~~- No downloads required.~~
- #TODO: Now has to download. Check the url on their website.


# 2) Load images into Docker
The script `load_docker_imgs.sh` load the images downloaded in step 1 into the docker image cache. Please change the `docker_img_folders` variable according to your needs; it should point to the directory where you saved the images from part 1.

Then run:
```shell
./scripts/environments/load_docker_imgs.sh <site_name>
```

Where site_name in: `['shopping' 'admin' 'reddit' 'gitlab' 'classifieds']`

**NOTES**: 
- This step take a couple minutes for each website.
- This needs to be **run only one time for each website**. Once image is loaded into Docker, you can deploy, delete, stop the containers whenever you need. 
    - Run `docker images` to see which ones are not loaded.
- For `classifieds`, script runs `docker pull ...` as in part 1. If didn't download the image in step (1), the script will download the whole image to load into docker, and this will take a while.
- After loading images into Docker, you can delete the `.tar` files to reclaim storage if desired.
- You can also run each command individually by copying them from the script and executing in the terminal. For example, for the `shopping` website:
    ```shell
    docker load --input <docker_img_folders>/shopping_final_0712.tar
    ```

 
## Verify execution, delete images

After execution, you can run the command below to check which images were sucessfully loaded into Docker: 

```shell
docker images
```

If any image was already loaded, you can reload it by deleting them from cache and running the script again:

```shell
docker image list  # get the image ID; suppose it is `0dbbdef09dd9` for shopping website:
docker rmi 0dbbdef09dd9
./scripts/environments/load_docker_imgs.sh shopping
```

# 3) Hosting the websites
**NOTE**: needs to complete step 2 for all websites, except `homepage`.

- The script `start_reset_envs.sh` contains commands to host the websites at specified endpoints and ports, defaulting to: `http://localhost:<site-specific-port>`.

- (Optional): Update the following in the script according to your needs: `HOMEPAGE_PATH`, `WIKIPEDIA_PATH`, `ports` hashmap. See the script for details.
    - **NOTE ON PORTS:** the ports in the script are often available and don't require sudo access. But if this is not the case, change as needed.
    - It is also possible to update `BASE_URL` to make sites accessible at a customized endpoint `http://BASE_URL:port`. See section 3.1 below.


- Then run:
    ```
    ./scripts/environments/start_reset_envs.sh [-f] <site_name1> <site_name2> ...
    ```

    - Where site_name in: `['shopping' 'admin' 'reddit' 'gitlab' 'classifieds' 'wikipedia' 'homepage']`
    - Set the `-f` flag to run the Flask server for the `homepage` on the foreground 

    - You can also run the commands in the script manually directly in the terminal.

    - After exeuction, the script will print a list of docker containers running and the status of each of the websites (i.e., if they are reachable or not)


- Your website should be available on `http://<BASE_URL>:<site-specific-port>`. Wait a bit for all services to start.
    - E.g.: `http://localhost:9999` for Reddit with default `BASE_URL` and ports. 

## Resetting environments / websites
- Simply run the same `start_reset_envs` script with the websites you want to reset. For example, to reset `shopping` and `reddit`:

```bash
./scripts/environments/start_reset_envs.sh shopping reddit
```

## Checking if Websites are UP/DOWN
`./scripts/environments/start_reset_envs.sh` already prints a check on the websites.

You can also check the status of websites through the following options:

- Run `./scripts/environments/check_websites.sh` to check which websites are reachable.

- Test if a specific site is working running the below command and checking the HTTP header (first line of the output). Wait a bit for all services to start after starting the container.
    ```
    curl -I http://<BASE_URL>:<port>
    ```
    - Example: `curl -I http://localhost:9999` should return something like this:
        ```
        # HTTP/1.1 200 OK
        Server: nginx/1.22.1
        Content-Type: text/html; charset=UTF-8
        Connection: keep-alive
        X-Powered-By: PHP/8.1.17
        ...
        ```

- If the machine has access to a GUI browser, simply open `http://<BASE_URL>:<port>` in it.

- Use `vscode` *Simple Browser preview*. It provides a browser inside vscode that allows for visual inspection via GUI navigation. IMPORTANT: don't work for all sites; `reddit` for instance tends to show an all blank page.

- Do a sample `run.py` and investigate the Agent's response.



### 3.1) (Optional)  Changing the BASE_URL 
**NOTE: this requires admin/sudo rights.**

#### Linux
Making the website available at `BASE_URL` other than `localhost`, requires to create a mapping in `/etc/hosts`. 

Example: suppose we want the sites to be accessible via `http://vwebarena:7770`. 

Steps:
1. Open `/etc/hosts` file with text editor
2. Add the line `127.0.0.1 vwebarena` and save
3. Change the `BASE_URL` in `start_reset_envs.sh` and execute it

Repeat if want it available for more than one endpoint.


#### Windows + Linux WSL2
If want to access the websites in Windows, must bind the endpoints to local host. As follows:

```powershell
cd C:\Windows\System32\drivers\etc
code hosts
# add lines such as below, substituting webarena by your desired endpoint name
127.0.0.1 webarena
```

# 4) Accessing via other machines on the same network
If you are in shared cluster, such as RIPL machines, you can access a website hosted in say RIPL-W1 through any other machines by substituting 'localhost' by RIPL-W1's IP. 
The IPs of all machines are available [here](https://docs.google.com/spreadsheets/d/1UcKKhXYtV1hFhFCZPKK5lHAK6neFXzmeUe-Tz0mmWjQ/edit?gid=958948356#gid=958948356).

Example:
Suppose you did all the steps above in RIPL-W1, and the `Reddit` website is available at `http://localhost:9999`. 

To access this website via RIPL-W2, use:  `http://143.215.128.18:9999`


# 5) (Optional) Hosting the environments on the Web

TODO