## Dataset
Our dataset is generated by the code of [The Power of Noise: Redefining Retrieval for RAG Systems](https://github.com/florin-git/The-Power-of-Noise).
We out the code in generate_dataset

The performance of vicuna can be reproduced by
```
python DPrompt_main.py --base_model_name lmsys/vicuna-7b-v1.5 --lr 1e-6 --num_document_token 1 --dataset_name gold_only_reverse --num_virtual_token 1 --device 2;
```


## Environment
transformers==4.44.2
trl==0.10.1
peft==0.12.0



We do not use the smooth loss because it will cause some bug, so in `transformers/trainer_pt_utils.py`, we do not use the label smoother, specifically

from line 577 of `transformers/trainer_pt_utils.py`, we use 


    nll_loss.masked_fill_(padding_mask, 0.0)
    #smoothed_loss.masked_fill_(padding_mask, 0.0)

    # Take the mean over the label dimensions, then divide by the number of active elements (i.e. not-padded):
    num_active_elements = padding_mask.numel() - padding_mask.long().sum()
    nll_loss = nll_loss.sum() / num_active_elements
    #smoothed_loss = smoothed_loss.sum() / (num_active_elements * log_probs.shape[-1])
    return (1 - self.epsilon) * nll_loss# + self.epsilon * smoothed_loss