# Brain-LLM-Alignment-Attribution
Official repository of the paper "**Fine-grained Analysis of Brain-LLM Alignment through Input Attribution**".

## Abstract
Understanding the alignment between large language models (LLMs) and human brain activity can reveal computational principles underlying language processing. We introduce a fine-grained input attribution method to identify the specific words most important for brain-LLM alignment, and leverage it to study a contentious research question about brain-LLM alignment: the relationship between brain alignment and next-word prediction (NWP). Our findings reveal that brain alignment and NWP rely on largely distinct word subsets: NWP exhibits recency and primacy biases with a focus on syntax, while brain alignment prioritizes semantic and discourse-level information with a more targeted recency effect. This work advances our understanding of how LLMs relate to human language processing and highlights differences in feature reliance between brain alignment and NWP. Beyond this study, our attribution method can be broadly applied to explore the cognitive relevance of model predictions in diverse language processing tasks.

## Setup
### Environment setup
#### (Option 1) Python environment
```
       python3 -m venv ./.venv
       virtualenv --clear ./.venv
       source ./.venv/bin/activate
       ./.venv/bin/python3 -m pip install -r requirements.txt
```
#### (Option 2) Docker container
```
       cd Dockerfile
       docker build -t brain-llm-env .
       docker run -it --gpus all -v <path_to_root_dir>:/workspace brain-llm-env
```
### Llama3 setup
1) Obtain weights for the desired model by filling out the form at [this link](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
2) Follow the given instructions to download them. This opertation requires having enough disk space. If you wish to change the default download directory (_~/.llama/_), you need to modify the following files:
    * In _<your\_env\_dir>/lib/python3.10/site-packages/llama\_stack/cli/download.py_ change
       ```
       output_dir = model_local_dir(model.descriptor())
       ```
       to
       ```
       output_dir = Path("<desired_path>")
       ```
   * In _<your\_env\_dir>/lib/python3.10/site-packages/llama\_stack/distribution/utils/config\_dirs.py_ change
       ```
       LLAMA_STACK_CONFIG_DIR = Path(
           os.getenv("LLAMA_STACK_CONFIG_DIR", os.path.expanduser("~/.llama/")
       )
      ```
       to
      ```
       LLAMA_STACK_CONFIG_DIR = Path(
           os.getenv("LLAMA_STACK_CONFIG_DIR", os.path.expanduser("<desired_path>"))
       )
       ```
3) Convert the original checkpoints to HuggingFace checkpoints:
   ```
   python src/convert_llama_weights_to_hf.py --input_dir <path_to_llama_ckpt_dir> --model_size <model_size> --output_dir <path_to_desired_output_dir> --llama_version <llama_version>
   ```
   Ex.
   ```
   python src/convert_llama_weights_to_hf.py --input_dir ./.llama_ckpts/checkpoints/Llama3.1-8B/ --model_size 8B --output_dir ./.llama3.1 --llama_version 3.1
   ```
4) In the Llama config files in the _configs_ folder, set _hugging\_face\_model\_id_=<path_to_desired_output_dir>, which is the directory where you stored the converted checkpoints.

## Dataset
We use two publicly available fMRI datasets. In the following, we explain where to download them and how to organize the dataset folders.

### Harry Potter
The Harry Potter dataset contains fMRI data of 8 subjects reading _Harry Potter and the Sorcerer's Stone_. Access to the data can be requested at the following [link](https://www.cs.cmu.edu/~fmri/plosone/). In order for our code to work properly, you should create a _data_ folder in the root directory with the following structure:

```
data/
└── HarryPotter/
    ├── fMRI/
    │   ├── data_subject_F.npy         # fMRI data for subject F
    │   ├── data_subject_H.npy         # same structure for subjects H–N
    │   ├── data_subject_I.npy
    │   ├── data_subject_J.npy
    │   ├── data_subject_K.npy
    │   ├── data_subject_L.npy
    │   ├── data_subject_M.npy
    │   ├── data_subject_N.npy
    │   ├── hp_subj_roi_inds.npy       # ROI indices for subjects
    │   ├── runs_fmri.npy              # Run indices for fMRI
    │   ├── time_fmri.npy              # Timestamps for fMRI scans
    │   ├── time_words_fmri.npy        # Timestamps for words presented
    │   └── words_fmri.npy             # List of words shown during fMRI
    │
    ├── voxel_neighborhoods/
    │   ├── F_ars_auto2.npy            # Neighborhoods for subject F
    │   ├── H_ars_auto2.npy            # same structure for subjects H–N
    │   ├── I_ars_auto2.npy
    │   ├── J_ars_auto2.npy
    │   ├── K_ars_auto2.npy
    │   ├── L_ars_auto2.npy
    │   ├── M_ars_auto2.npy
    │   └── N_ars_auto2.npy
    │
    └── story_features.mat
```

### The Moth Radio Hour
The Moth Radio Hour dataset contains fMRI data of 9 subjects listening to and reading to autobiographical stories from the omonymous podcast. Data can be downloaded at the following [link](https://gin.g-node.org/denizenslab/narratives_reading_listening_fmri). You also need to download the files _grids\_huge.jbl_, _trfiles\_huge.jbl_, and _wordseqs.jbl_ from the following [link](https://utexas.app.box.com/v/EncodingModelScalingLaws/folder/230420528915), and put them into a folder called _preprocessed_ inside the dataset folder.


## Usage
All the commands need to be executed from the root directory. If you want to run a particular procedure for a subset of subjects, it is sufficient to specify their indices in the desired config file, by changing the field _experiment.subject_idx_ (e.g., subject_idx: 'F,G').

### Brain alignment pipeline
* To run the full brain encoding pipeline, run:
  ```
  python src/brain_encoding_pipeline.py --config configs/<desired_config_file> -l <desired_context_length>
  ```
* To only extract the model's representations and compute the aggregate TR embeddings, run:
  ```
  python src/extract_model_embeddings.py --config configs/<desired_config_file> -l <desired_context_length>
  ```
* If you already computed the aggregated embeddings and just want to train the brain encoding model, run:
  ```
  python src/predict_brain_alignment.py --config configs/<desired_config_file> -l <desired_context_length>
  ```

### Brain alignment attribution
To compute attributions for brain alignment, run:
```
python src/attribution_pipeline.py --config configs/<desired_config_file> -l <desired_context_length> -m <desired_attribution_method>
```

### Next word prediction attribution
To compute attributions for next word prediction, run:
```
python src/next_word_prediction.py --config configs/<desired_config_file> -l <desired_context_length> -m <desired_attribution_method>
```

### Plotting
* To plot brain alignment results, run:
  ```
  python src/plot_alignment.py -m <desired_model1> <desired_model2> -l <desired_context_length>
  ```
  For instance, to generate Figure 11 in Appendix D:
  ```
  python src/plot_alignment.py -m llama3.2-1B falcon3 gemma mamba zamba -l 640
  ```
* To generate all plots in the paper, except the story features analysis, run:
  ```
  python src/attribution_analysis.py
  ```
* To generate plots for the story features analysis, run:
  ```
  python src/attribution_df_analysis.py
  ```
* The previous two plotting scripts, store the necessary data to regenerate the produced plots. To re-generate the plots (e.g., to make layout modifications), run:
  ```
  python src/regenerate_plots.py --cache-dir ./plots/attribution_analysis/<desired_dataset>_<desired_context_length_<desired_attribution_method>
  ```
