# Brittleness of ReAct in ALfWorld


## Prompts 
Prompts can be found under prompts/perturb_example_prompts
For some variations we build the prompt through code for example RQ3-ALL, BOTH, ONE etc. 

The LLM outputs may give away Author anonymity, therefore, we will be releasing it with Camera Ready Version upon acceptance.


## Use VSCode DevContainers

Setup : 

1. Make sure VSCode has devcontainer extension installed. 
2. You have docker that is already setup (you can run `docker ps`, `docker images`) easily.

Running : 
1. Clone the repository : `<REDACTED>`
2. Run the devcontainer : VSCode should give a popup to run the code within a devcontainer. If not, then do Cmd + Shift + P to open VSCode command pallete and search for `Rebuild Container` which should start the devcontainer. 
3. Specify `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` as environment variable. 

Please use the instructions of ReAct codebase for using our code. 


## Runner scripts : 
For running base ReAct on truncated set of examples (for larger models)
use : 
`runner_alfworld_truncated_exec.py`

For full 134 instances use : 
``runner_alfworld.py``


For running variations : 
use : 
`perturb_runner_alfworld.py` or  `perturb_runner_alfworld_truncated_exec.py`

Update `LLM_MODEL` to reflect the model of your choice. 
Make sure that the corresponding API key is exported to the environment using `export OPENAI_API_KEY=<my key>` in the terminal session. 

Modify `PERTURB_MODE` to be one of : 

RQ1 : 
- abstraction/global [Exemplar CoT]
- abstraction/global2 [Anon. Exemplar CoT]

- abstraction/global-problem-all
- abstraction/global-problem-partial
- abstraction/global-problem

RQ2 : 
- content/domain  
- content/problem    [BOTH]
- content/problem-partial   [ONE]
- content/problem-all     [ALL]
- content/instance
- nature/optimal_plan_length

RQ3 : 
- nature/explanation
- nature/failure
- nature/success
- nature/magic
- structure/freeform
- structure/ordering
- structure/structured


