# Understanding Impact of Human Feedback via Influence Functions

## Installation

To get started, follow the steps below to install the required packages and dependencies.

1. **Create a Conda environment** with Python 3.10 and essential dependencies:
    ```bash
    conda create -n if_rlhf python=3.10 absl-py pyparsing pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia # If GPU is available
    conda create -n if_rlhf python=3.10 # If GPU is not available
    
    conda activate if_rlhf
    ```

2. **Verify GPU availability**:
    ```python
    import torch
    print(torch.cuda.is_available())  # Should return True if GPU is available
    ```

3. **Install remaining package dependencies**:
    ```bash
    python -m pip install -e .
    pip install -r requirements.txt
    ```

4. **Install FlashAttention**:
      ```bash
      MAX_JOBS=4 pip install flash-attn --no-build-isolation
      ```

5. **Optional Support**:
    - For DeepSpeed:
      ```bash
      conda install -c conda-forge mpi4py mpich
      ```

## Dataset Download

Before proceeding with the experiments, download the necessary dataset from [Dataset download link](https://drive.google.com/file/d/1kTa7qG_PtCHg0k35Wgm8t1xZr7Fb1oRi/view?usp=sharing). Once downloaded, place the dataset into a folder named `dataset` in the root directory. Structure of the dataset as follows:

```bash
dataset/
    ├── baselines/ ...
    ├── model/ ...
    ├── train/ ...
    └── val/ ...
```

## Experiment Scripts

This repository provides Jupyter notebooks that allow for experiments using pre-made Influence Datasets. The following three key experimental workflows are facilitated by these notebooks:

1. **Measure Length Bias** – This experiment evaluates the model's length-bias

2. **Measure Sycophancy Bias** – This experiment examines the model's sycophancy-bias

3. **Improve Labeler Strategy** – This experiment focuses on improving a labeler strategy by updating one of the bob's weights.

Each notebook walks you through the experimental setup, utilizing the pre-generated Influence Dataset to streamline the process of computing and analyzing the model's behavior in these specific scenarios.

### Running Experiments

- **Measure Length Bias**:
Open and execute the `measure_length_bias.ipynb` notebook.

- **Measure Sycophancy Bias**:
Open and execute the `measure_sycophancy_bias.ipynb` notebook.

- **Improve Labeler Strategy**:
Open and execute the `improve_labeler_strategy.ipynb` notebook.

### Pre-Requisites for Running Notebooks

Before running any of the experiments, ensure that you have set up the environment as described in the **Installation** and **Dataset Download** section of this README. Make sure the pre-trained reward models and necessary Influence Datasets are available in the correct paths, as specified in each notebook.

By following the provided notebooks, you can explore the impact of human feedback on model behavior and use influence functions to fine-tune reward modeling strategies effectively.

## Influence Computation Example

The following example demonstrates how to extract influence data. Influence computation involves five steps, with Steps (1) and (2) executable in parallel.

1. **Compute `val_grad_avg`** (Step 1):
    ```bash
    CUDA_VISIBLE_DEVICES=0 python ./src/experiment/influence/compute_val_grad_avg.py \
    --model_path "" \
    --tokenizer_path "" \
    --eval_dataset_dir "" \
    --val_names ""
    ```

2. **Compute `lambda`** (Step 2):
    ```bash
    CUDA_VISIBLE_DEVICES=0 python ./src/experiment/influence/compute_lambda.py \
    --model_path "" \
    --tokenizer_path "" \
    --train_data_path ""
    ```

3. **Compute `r_l`** (Step 3):
    ```bash
    CUDA_VISIBLE_DEVICES=0 python ./src/experiment/influence/compute_r_l.py \
    --model_path "" \
    --tokenizer_path "" \
    --train_data_path "" \
    --val_names ""
    ```

4. **Compute `Influence`** (Step 4):
    ```bash
    CUDA_VISIBLE_DEVICES=0 python ./src/experiment/influence/compute_influence.py \
    --model_path "" \
    --tokenizer_path "" \
    --train_data_path "" \
    --val_names ""
    ```

5. **Compute `DataInf`** (Step 5):
    ```bash
    CUDA_VISIBLE_DEVICES=0 python ./src/experiment/influence/cache_gradients.py \
    --model_path "." \
    --train_data_path "" \
    --save_name "rapid_grad_train.pt" \
    --seed 43

    CUDA_VISIBLE_DEVICES=0 python ./src/experiment/influence/cache_gradients.py \
    --model_path "" \
    --train_data_path "" \
    --save_name "rapid_grad_val.pt" \
    --seed 43
    ```
