# You Do Not Fully Utilize Transformer's Representation Capacity

Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"

## About

In contrast to RNNs, which compress previous tokens into a single hidden state, standard Transformers can
attend to all previous tokens directly, but only using representations from the immediately preceding layer. We demonstrate this causes representation collapse and leads to suboptimal performance. Our solution, Layer-Integrated Memory (LIMe), preserves the model's memory footprint while expanding its representational capacity through controlled access to hidden states from earlier layers. Experiments across various architectures and tasks show consistent improvements, with analysis revealing insights about information aggregation in deep networks.

## Installation

```bash
pip install -r requirements.txt
pip install -e .
```


## Dataset Preparation

All the config classes are located in `config.py`. Before start, ensure that you set up `data_path` in `DataConfig`, path to the downloaded dataset. 

Download and preprocess deduplicated FineWeb-Edu dataset.

```bash
python src/datasets/prepare_fineweb.py
```


## Training

Use the following commands to start training:

```bash
export WANDB_API_KEY="YOUR_API_KEY"
export WANDB_ENTITY="YOUR_WANB_ENTITY"
export WANDB_BASE_URL="https://api.wandb.ai"

accelerate launch --mixed_precision "bf16" --multi_gpu train.py \
    --config_path /app/configs/config_base.yaml --wandb_config.project "lime"
```

To train deep model, use argument `--config configs/config_deep.yaml`. Also, you can add any specific arguments for config classes attributes. Navigate to `config.py` for more information.

## Analysis

All the source code for analysis is stored in `src/analysis/`.

- `representations.py`: Contains the code for extracting hidden states and values from the model.
- `classification.py`: Includes the implementation for evaluating classification metrics.
- `entropy.py`: Contains the code for performing entropy evaluation.