# Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?

This document includes the code of generating the FTCT dataset, training Transformers and MLPs on FTCT tasks, testing the performance under different criteria, plotting the curves, plotting the heatmap and doing linear probing.

## Environment
```
pip install datasets
pip install transformers
pip install accelerate -U
```

## Data Generation and Model Training
To train Transformers on FTCT, change the ```if_train``` parameter in ```tain.sh``` into "y" and run 
```
bash train.sh
```
```graph_len``` controls the depth of the causal graph; ```graph_width``` controls the maximum number of vertices in the same level of the causal graph; ```merge_pos``` determines the positions of vertices on the graph with degree more than 1; ```max_child_length``` determines the max child chain length in the training data. Each ```graph_type``` represented a randomly generated causal graph.

```num_icl_train_traces``` determines the number of training sentences that are pure noise without vertices from the child chain. ```num_mk_train_traces``` determines the number of training sentences that contain vertices from the child chain. 

```n_layers``` determines the number of Transformer's layers; ```n_heads``` determines the number of Transformer's heads.

To train MLPs on FTCT, change the ```if_train``` parameter in ```train_mlp.sh``` into "y" and run

```
bash train_mlp.sh
```
The parameters are mostly the same as training Transformers, while ```n_layers``` here represents the depth of MLP, and ```window_size``` is the size of sliding windows limiting the input size under a fixed bound.

## Model Testing
Set the ```if_test``` parameter in ```train.sh``` to "y" and run ```bash train.sh``` to test the trained Transformers. Set the ```if_test``` parameter in ```train_mlp.sh``` to "y" and run ```bash train_mlp.sh``` to test the trained MLPs. 

## Performance Plotting
After training Transformers on FTCT, run ```bash draw.sh``` to plot the performance curves under different criteria. Specifically, setting the ```mode``` parameter to "main" shows the relationship between testing performance and shots number; setting the ```mode``` parameter to "ratio" shows the relationship between testing performance and relative knowledge ratio.

## Heatmap Plotting
Set the ```if_plot``` parameter in ```train.sh``` to "y" and run ```bash train.sh``` to plot the heatmap of Transformer's attention.

## Linear Probing
Set the ```if_probe``` parameter in ```train.sh``` to "y" and run ```bash train.sh``` to probe the Transformer's attention assignment.