# README

This is the supplementary material for submission 6356 "Attention Flows for General Transformers" of ICLR 2023.
We thank the reviewers for taking time to review this submission and for taking a look into the supplementary material.

## Installation

We forked ml2, which is an open-source library licensed under MIT to get access to the models performing logical reasoning (LR) tasks.
The models used in the experiment are publically available either in the `ml2` library or in the `transformers` python library (huggingface.co).
When reproducing the experiments, the models will automatically be downloaded. Please note that this requires a stable internet connection and free disk space.
Additionally to the dependencies of ml2, we require `networkx` and `seaborn` to be installed.
All dependencies used in our code can be installed as follows.

```pip install .```

## Code

Our implementation can be found under `/ml2/shapley`, where the main implementation is in `attention_flow.py`.

## Reproduce Experiments

In the folder `notebooks`, we provide a notebook for each experiment (and respective figure) in the paper.
Experiments can be reproduced by running the notebooks.
A visualization of the attention flow values will be printed for each head and for every decoded position.
After every decoded position, the sum of all heads is visualized as well, which is used in most of the figures.
Note that, depending on the experiment, this might take a while.
For example, running the full notebook for Figure 3(a) solves 156 maxflow computations and took around 22 minutes while testing on a personal machine.
Running the notebook for Figure 3(b)-1 and Figure 3(b)-2 took around 1 minute each.