
<a name="readme-top"></a>




<br />
<div align="center">

<h3 align="center">Testing Theory-of-Mind in Large Language Model-Based Multi-Agent Design Patterns
</h3>


</div>



<!-- TABLE OF CONTENTS -->
<details>
  <summary>Table of Contents</summary>
  <ol>
    <li><a href="#System Requirements">System Requirements</a></li>
    <li><a href="#Processed Data Provided to AI">Processed Data Provided to AI</a>
    </li>
    <li>
      <a href="#Complete Communication History between Human Author(s) and AI">Complete Communication History between Human Author(s) and AI</a>
    </li>
    <li><a href="#Finalized Jupyter Notebook based on Code Generated by AI">Finalized Jupyter Notebook based on Code Generated by AI</a></li>
    </li>
  </ol>
</details>



## System Requirements

- Install dependencies according to `./requirements.txt`.
- The finalized executable Jupyter notebook, based on AI-generated code, can be run on a free-tier Google Colab instance (CPU only), with a total execution time of under 30 minutes if the code related to the ToM Capability Estimator (TCE), a Bayesian hierarchical model, is excluded. Running the TCE section on a free-tier Google Colab instance with GPU support takes less than two hours.



## Processed Data Provided to AI

The processed ToM testing results provided to AI are under the `processed_ToM_testing_results` directory:

- FANToM
  - `FANToM_FactQ.csv`
  - `FANToM_BeliefQ_Choice_First_Order.csv`
  - `FANToM_BeliefQ_Choice_Second_Order_Cyclic.csv`
  - `FANToM_BeliefQ_Choice_Second_Order_Acyclic.csv`
  - `FANToM_BeliefQ_Dist_First_Order.csv`
  - `FANToM_BeliefQ_Dist_First_Order_TokenF1.csv`
  - `FANToM_BeliefQ_Dist_Second_Order_Cyclic.csv`
  - `FANToM_BeliefQ_Dist_Second_Order_Cyclic_TokenF1.csv`
  - `FANToM_BeliefQ_Dist_Second_Order_Acyclic.csv`
  - `FANToM_BeliefQ_Dist_Second_Order_Acyclic_TokenF1.csv`
  - `FANToM_AnswerabilityQ_List.csv`
  - `FANToM_AnswerabilityQ_List_included_unaware_character.csv`
  - `FANToM_AnswerabilityQ_List_excluded_aware_character.csv`
  - `FANToM_AnswerabilityQs_Binary.csv`
  - `FANToM_Info_AccessQ_List.csv`
  - `FANToM_Info_AccessQ_List_included_unaware_character.csv`
  - `FANToM_Info_AccessQ_List_excluded_aware_character.csv`
  - `FANToM_Info_AccessQ_Binary.csv`
- Hi-ToM
  - `Hi_ToM_Order_0.csv`
  - `Hi_ToM_Order_0_Mentioned_Container_Order.csv`
  - `Hi_ToM_Order_1.csv`
  - `Hi_ToM_Order_1_Mentioned_Container_Order.csv`
  - `Hi_ToM_Order_2.csv`
  - `Hi_ToM_Order_2_Mentioned_Container_Order.csv`
  - `Hi_ToM_Order_3.csv`
  - `Hi_ToM_Order_3_Mentioned_Container_Order.csv`
  - `Hi_ToM_Order_4.csv`
  - `Hi_ToM_Order_4_Mentioned_Container_Order.csv`
  - `Hi_ToM_Teller_Knowledge_Container_Public_Claim.csv`
  - `Hi_ToM_Teller_Knowledge_Container_Public_Claim_Mentioned_Container_Order.csv`
  - `Hi_ToM_Teller_Knowledge_Container_Private_Communication.csv`
  - `Hi_ToM_Teller_Knowledge_Container_Private_Communication_Mentioned_Container_Order.csv`
  - `Hi_ToM_Teller_Lie_Public_Claim.csv`
  - `Hi_ToM_Teller_Lie_Private_Communication.csv`
  - `Hi_ToM_Listener_Temporal_Public_Claim.csv`
  - `Hi_ToM_Listener_Temporal_Private_Communication.csv`
  - `Hi_ToM_Listener_Belief_Public_Claim.csv`
  - `Hi_ToM_Listener_Belief_Private_Communication.csv`

The variables explanation is documented in `variables explanation.docx`.


<p align="right">(<a href="#readme-top">back to top</a>)</p>


## Complete Communication History between Human Author(s) and AI

The complete communication history between human author(s) and AI, including all the prompts, reasoning, and responses, is organized in:

- `prompts_and_responses.md`



<p align="right">(<a href="#readme-top">back to top</a>)</p>


## Finalized Jupyter Notebook based on Code Generated by AI

The finalized Jupyter notebook to reproduce our results based on the code generated by AI is:

- `reproducing_results.ipynb`

This finalized version is based on iterations of debugging and improvements carried out by AI with some human interventions, and the full history is also reported in `prompts_and_responses.md`.



<p align="right">(<a href="#readme-top">back to top</a>)</p>


