Metadata-Version: 2.1
Name: transformers
Version: 4.4.2
Summary: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
Home-page: https://github.com/huggingface/transformers
Author: Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Sam Shleifer, Patrick von Platen, Sylvain Gugger, Google AI Language Team Authors, Open AI team Authors, Facebook AI Authors, Carnegie Mellon University Authors
Author-email: thomas@huggingface.co
License: Apache
Keywords: NLP deep learning transformer pytorch tensorflow BERT GPT GPT-2 google openai CMU
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
Requires-Dist: dataclasses; python_version < "3.7"
Requires-Dist: importlib_metadata; python_version < "3.8"
Requires-Dist: filelock
Requires-Dist: numpy>=1.17
Requires-Dist: packaging
Requires-Dist: regex!=2019.12.17
Requires-Dist: requests
Requires-Dist: sacremoses
Requires-Dist: tokenizers<0.11,>=0.10.1
Requires-Dist: tqdm>=4.27
Provides-Extra: ja
Requires-Dist: fugashi>=1.0; extra == "ja"
Requires-Dist: ipadic<2.0,>=1.0.0; extra == "ja"
Requires-Dist: unidic_lite>=1.0.7; extra == "ja"
Requires-Dist: unidic>=1.0.2; extra == "ja"
Provides-Extra: sklearn
Requires-Dist: scikit-learn; extra == "sklearn"
Provides-Extra: tf
Requires-Dist: tensorflow>=2.3; extra == "tf"
Requires-Dist: onnxconverter-common; extra == "tf"
Requires-Dist: keras2onnx; extra == "tf"
Provides-Extra: tf-cpu
Requires-Dist: tensorflow-cpu>=2.3; extra == "tf-cpu"
Requires-Dist: onnxconverter-common; extra == "tf-cpu"
Requires-Dist: keras2onnx; extra == "tf-cpu"
Provides-Extra: torch
Requires-Dist: torch>=1.0; extra == "torch"
Provides-Extra: retrieval
Requires-Dist: faiss-cpu; extra == "retrieval"
Requires-Dist: datasets; extra == "retrieval"
Provides-Extra: flax
Requires-Dist: jax>=0.2.8; extra == "flax"
Requires-Dist: jaxlib>=0.1.59; extra == "flax"
Requires-Dist: flax>=0.3.2; extra == "flax"
Provides-Extra: tokenizers
Requires-Dist: tokenizers<0.11,>=0.10.1; extra == "tokenizers"
Provides-Extra: onnxruntime
Requires-Dist: onnxruntime>=1.4.0; extra == "onnxruntime"
Requires-Dist: onnxruntime-tools>=1.4.2; extra == "onnxruntime"
Provides-Extra: onnx
Requires-Dist: onnxconverter-common; extra == "onnx"
Requires-Dist: keras2onnx; extra == "onnx"
Requires-Dist: onnxruntime>=1.4.0; extra == "onnx"
Requires-Dist: onnxruntime-tools>=1.4.2; extra == "onnx"
Provides-Extra: modelcreation
Requires-Dist: cookiecutter==1.7.2; extra == "modelcreation"
Provides-Extra: serving
Requires-Dist: pydantic; extra == "serving"
Requires-Dist: uvicorn; extra == "serving"
Requires-Dist: fastapi; extra == "serving"
Requires-Dist: starlette; extra == "serving"
Provides-Extra: speech
Requires-Dist: soundfile; extra == "speech"
Requires-Dist: torchaudio; extra == "speech"
Provides-Extra: sentencepiece
Requires-Dist: sentencepiece==0.1.91; extra == "sentencepiece"
Requires-Dist: protobuf; extra == "sentencepiece"
Provides-Extra: testing
Requires-Dist: pytest; extra == "testing"
Requires-Dist: pytest-xdist; extra == "testing"
Requires-Dist: timeout-decorator; extra == "testing"
Requires-Dist: parameterized; extra == "testing"
Requires-Dist: psutil; extra == "testing"
Requires-Dist: datasets; extra == "testing"
Requires-Dist: pytest-sugar; extra == "testing"
Requires-Dist: faiss-cpu; extra == "testing"
Requires-Dist: datasets; extra == "testing"
Requires-Dist: cookiecutter==1.7.2; extra == "testing"
Provides-Extra: docs
Requires-Dist: recommonmark; extra == "docs"
Requires-Dist: sphinx==3.2.1; extra == "docs"
Requires-Dist: sphinx-markdown-tables; extra == "docs"
Requires-Dist: sphinx-rtd-theme==0.4.3; extra == "docs"
Requires-Dist: sphinx-copybutton; extra == "docs"
Provides-Extra: quality
Requires-Dist: black>=20.8b1; extra == "quality"
Requires-Dist: isort>=5.5.4; extra == "quality"
Requires-Dist: flake8>=3.8.3; extra == "quality"
Provides-Extra: all
Requires-Dist: tensorflow>=2.3; extra == "all"
Requires-Dist: onnxconverter-common; extra == "all"
Requires-Dist: keras2onnx; extra == "all"
Requires-Dist: torch>=1.0; extra == "all"
Requires-Dist: jax>=0.2.8; extra == "all"
Requires-Dist: jaxlib>=0.1.59; extra == "all"
Requires-Dist: flax>=0.3.2; extra == "all"
Requires-Dist: sentencepiece==0.1.91; extra == "all"
Requires-Dist: protobuf; extra == "all"
Requires-Dist: tokenizers<0.11,>=0.10.1; extra == "all"
Provides-Extra: dev
Requires-Dist: tensorflow>=2.3; extra == "dev"
Requires-Dist: onnxconverter-common; extra == "dev"
Requires-Dist: keras2onnx; extra == "dev"
Requires-Dist: torch>=1.0; extra == "dev"
Requires-Dist: jax>=0.2.8; extra == "dev"
Requires-Dist: jaxlib>=0.1.59; extra == "dev"
Requires-Dist: flax>=0.3.2; extra == "dev"
Requires-Dist: sentencepiece==0.1.91; extra == "dev"
Requires-Dist: protobuf; extra == "dev"
Requires-Dist: tokenizers<0.11,>=0.10.1; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-xdist; extra == "dev"
Requires-Dist: timeout-decorator; extra == "dev"
Requires-Dist: parameterized; extra == "dev"
Requires-Dist: psutil; extra == "dev"
Requires-Dist: datasets; extra == "dev"
Requires-Dist: pytest-sugar; extra == "dev"
Requires-Dist: faiss-cpu; extra == "dev"
Requires-Dist: datasets; extra == "dev"
Requires-Dist: cookiecutter==1.7.2; extra == "dev"
Requires-Dist: black>=20.8b1; extra == "dev"
Requires-Dist: isort>=5.5.4; extra == "dev"
Requires-Dist: flake8>=3.8.3; extra == "dev"
Requires-Dist: fugashi>=1.0; extra == "dev"
Requires-Dist: ipadic<2.0,>=1.0.0; extra == "dev"
Requires-Dist: unidic_lite>=1.0.7; extra == "dev"
Requires-Dist: unidic>=1.0.2; extra == "dev"
Requires-Dist: recommonmark; extra == "dev"
Requires-Dist: sphinx==3.2.1; extra == "dev"
Requires-Dist: sphinx-markdown-tables; extra == "dev"
Requires-Dist: sphinx-rtd-theme==0.4.3; extra == "dev"
Requires-Dist: sphinx-copybutton; extra == "dev"
Requires-Dist: scikit-learn; extra == "dev"
Requires-Dist: cookiecutter==1.7.2; extra == "dev"
Provides-Extra: torchhub
Requires-Dist: filelock; extra == "torchhub"
Requires-Dist: importlib_metadata; extra == "torchhub"
Requires-Dist: numpy>=1.17; extra == "torchhub"
Requires-Dist: packaging; extra == "torchhub"
Requires-Dist: protobuf; extra == "torchhub"
Requires-Dist: regex!=2019.12.17; extra == "torchhub"
Requires-Dist: requests; extra == "torchhub"
Requires-Dist: sacremoses; extra == "torchhub"
Requires-Dist: sentencepiece==0.1.91; extra == "torchhub"
Requires-Dist: torch>=1.0; extra == "torchhub"
Requires-Dist: tokenizers<0.11,>=0.10.1; extra == "torchhub"
Requires-Dist: tqdm>=4.27; extra == "torchhub"

# Adapting RoBERTa and DeBERTa V2 using LoRA

This folder contains the implementation of LoRA in RoBERTa and DeBERTa V2 using the Python package `lora`. LoRA is described in the following pre-print:

**LoRA: Low-Rank Adaptation of Large Language Models** <br>
*Edward J. Hu\*, Yelong Shen\*, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen* <br>
Paper: https://arxiv.org/abs/2106.09685 <br>

## Adapting to the GLUE Benchmark
Our experiments on the GLUE benchmark are run on 4 NVIDIA Tesla V100 GPU cards out of a DGX-1. The results may vary due to different GPU models, drivers, CUDA SDK versions, floating-point precisions, and random seeds. 
We report below the dev set results, taking the medium over 5 runs:

<p>
<img src="figures/LoRA_NLU.PNG" width="800" >
</p>

Here are the GLUE benchmark test set results for DeBERTa XXL 1.5B (no ensemble):

<p>
<img src="figures/deberta_lora_glue.jpg" width="800" >
</p>

## Download LoRA checkpoints

|   | Dataset  | RoBERTa base 125M <br> LoRA - 0.3 M  | RoBERTa large 355M <br> LoRA - 0.8 M  | DeBERTa XXL 1.5B <br> LoRA - 4.7 M |
|---|----------|--------------------|----------------------|------------------|
|   | MNLI     |[3.4 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-base/roberta_base_lora_mnli.bin) |[7.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-large/roberta_large_lora_mnli.bin) |[27.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/DeBERTa/deberta_v2_xxlarge_lora_mnli.bin) |
|   | SST2     |[3.4 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-base/roberta_base_lora_sst2.bin)  |[7.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-large/roberta_large_lora_sst2.bin)  |[27.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/DeBERTa/deberta_v2_xxlarge_lora_mnli.bin)  |
|   | MRPC     |[3.4 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-base/roberta_base_lora_mrpc.bin)  |[7.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-large/roberta_large_lora_mrpc.bin)  |[27.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/DeBERTa/deberta_v2_xxlarge_lora_mnli.bin)  |
|   | CoLA     |[3.4 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-base/roberta_base_lora_cola.bin)  |[7.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-large/roberta_large_lora_cola.bin)  |[27.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/DeBERTa/deberta_v2_xxlarge_lora_mnli.bin)  |
|   | QNLI     |[3.4 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-base/roberta_base_lora_qnli.bin)  |[7.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-large/roberta_large_lora_qnli.bin)  |[27.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/DeBERTa/deberta_v2_xxlarge_lora_mnli.bin)  |
|   | QQP      |[3.4 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-base/roberta_base_lora_qqp.bin)  |[7.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-large/roberta_large_lora_qqp.bin)  |[27.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/DeBERTa/deberta_v2_xxlarge_lora_mnli.bin)  |
|   | RTE      |[3.4 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-base/roberta_base_lora_rte.bin)  |[7.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-large/roberta_large_lora_rte.bin)  |[27.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/DeBERTa/deberta_v2_xxlarge_lora_mnli.bin)  |
|   | STSB     |[3.4 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-base/roberta_base_lora_stsb.bin)  |[7.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/RoBERTa-large/roberta_large_lora_stsb.bin)  |[27.1 MB](https://github.com/msft-edward/LoRA_private/releases/download/DeBERTa/deberta_v2_xxlarge_lora_mnli.bin)  |

## Steps to reproduce our results
### Create and activate conda env
```console
conda env create -f environment.yml
```
### Install the pre-requisites
lora:
```console
pip install -e ..
```
NLU:
```console
pip install -e .
```
### Start the experiments
```console
deberta_v2_xxlarge_mnli.sh
deberta_v2_xxlarge_sst2.sh
deberta_v2_xxlarge_mrpc.sh
deberta_v2_xxlarge_cola.sh
deberta_v2_xxlarge_qnli.sh
deberta_v2_xxlarge_qqp.sh
deberta_v2_xxlarge_rte.sh
deberta_v2_xxlarge_stsb.sh
```
For MRPC, RTE, and STSB, you need to download and start from the LoRA-adapted MNLI checkpoint and change the path accordingly in the shell script.

Attention: xxlarge-mnli is the LoRA-adapted model from our first MNLI experiments, instead of https://huggingface.co/microsoft/deberta-v2-xxlarge-mnli.

We also provide the shell scripts for roberta-base and roberta-large ( {roberta_large|roberta_base}_{task name}.sh ).

### Evaluate the checkpoints
```console
python -m torch.distributed.launch --nproc_per_node=1 examples/text-classification/run_glue.py \
--model_name_or_path microsoft/deberta-v2-xxlarge \
--lora_path ./deberta_v2_xxlarge_lora_mnli.bin \
--task_name mnli \
--do_eval \
--output_dir ./output \
--apply_lora \
--lora_r 16 \
--lora_alpha 32
```

### Enable Cutoff/R-drop for data augmentation
```console
mnli.cutoff.sh
mnli.rdrop.sh
```

## Citation
```
@misc{hu2021lora,
    title={LoRA: Low-Rank Adaptation of Large Language Models},
    author={Hu, Edward and Shen, Yelong and Wallis, Phil and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Lu and Chen, Weizhu},
    year={2021},
    eprint={2106.09685},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```
