# Visual Large Language Models Exhibit Human-Like Cognitive Flexibility

This repository contains the code and data for the research paper "Visual Large Language Models Exhibit Human-Like Cognitive Flexibility". The study focuses on testing the set-shifting ability within the frontal lobe executive function of visual-capable large language models (GPT-4o, Gemini-1.5 Pro, and Claude-3.5 Sonnet) and Human.

![WCST Performance Violin Plot](https://github.com/guangfuhao/VLLMs_Exhibit_Cognitive_Flexibility/blob/main/fig_results/wcst_performance_violin.png)

## Table of Contents
1. [Installation](#installation)
2. [Dataset](#dataset)
3. [Experiment Data](#experiment-data)
4. [Configuration](#configuration)
5. [Running Experiments](#running-experiments)
6. [Human Experiment Interface](#human-experiment-interface)
7. [Vision Accuracy Testing](#vision-accuracy-testing)
8. [Analysis](#analysis)

## Installation

This project uses Python 3.9.13. To install the required libraries, run:

```
pip install -r requirements.txt
```

## Dataset

To run the experiments, you need to download the WSCT dataset:

1. Download [WSCT.zip](https://drive.google.com/file/d/1LHsG4qLvnK6aAkZ5eoiaxb7zDCGxFJea/view?usp=sharing)
2. Unzip and place the contents in the `task_datasets` folder
3. The file structure should be:
   ```
   task_datasets/WSCT/trial1/cards/0_circle_blue_1.png
   task_datasets/WSCT/trial1/cards.json
   task_datasets/WSCT/trial2/...
   ```

Alternatively, you can generate the dataset using `datasets_gen.py`.

## Experiment Data

The paper's experimental data, including model and human experiment data, is available for download:

1. Download the [experiment_logs.zip](https://drive.google.com/file/d/1A4kk9ibNnGucu1dHzNl4a9vgsKwAbMbG/view?usp=sharing)
2. Extract the contents to the `experiment_logs` folder

## Configuration

Before running the experiments, you need to set up your API keys in `constants.py`:

```python
OPENAI_API_KEY = "your_openai_api_key"
ANTHROPIC_API_KEY = "your_anthropic_api_key"
GOOGLE_API_KEY = "your_google_api_key"
```

## Running Experiments

To run experiments, start the main.py:

1. For a single session:
   ```python
   app = ExperimentGUI(root)
   ```

2. For batch automated runs:
   ```python
   app = ExperimentParallelGUI(root)
   ```

Results will be saved in the `experiment_logs` folder.

## Human Experiment Interface

`human_experiment_web.py` provides a web interface for human subject experiments. Use ngrok to publish it as a webpage for easy testing.

## Vision Accuracy Testing

1. Run `test_vision_accuracy.py` to test model vision capabilities
2. Use `test_vision_accuracy_calculate.py` to analyze the results
3. Results are saved in the `output` folder

## Analysis

Run `analysis.py` to analyze experimental results. Analysis outputs are saved in the `analyze_results` folder.

## Visualization

All figures and charts in the paper are generated using `plot_results.ipynb`. Run this Jupyter notebook to view all graphical analysis results.

For any questions or issues, please open an issue in this repository.
