## Stability and Generalization of Asynchronous SGD: Sharper Bounds Beyond Lipschitz and Smoothness


This repository is the official implementation of paper **Stability and Generalization of Asynchronous SGD: Sharper Bounds Beyond Lipschitz and Smoothness**. This part is for BERT model.


### Requirements

GPU environment required. Please check the `requirements.txt` for details.

### Getting the data

The preparation of the pre-training dataset is described in the `bertPrep.py` script found in the `data/` folder. The component steps in the automated scripts to prepare the datasets are as follows:

The tools used for preparing the BookCorpus and Wikipedia datasets can be applied to prepare an arbitrary corpus. The `create_datasets_from_start.sh` script in the `data/` directory.

-   [SST-2](https://nlp.stanford.edu/sentiment/index.html): for sentiment analysis.


### Example Usage

```python
# train
python asgd_bert.py --model 'BERT' --cuda-ps --batch-size 16 \
                    --dataset 'GLUE' --data_dir 'DATA_DIR' \
                    --delay 16 --delay-type 'random' \
                    --num-workers 8 \
                    --lr 0.01 --num-epochs 200 --seed 42 \
                    --do_lower_case --vocab_file=./vocab/vocab \
                    --config_file=bert_config.json \
                    --bert_model=bert-base-uncased
```

### Usage

```python
usage: asgd_bert.py [-h] 
    [--dataset data] [--model model] [--num-epochs N]
    [--seed SEED] [--delay DELAY]
    [--delay-type {fixed,random}] [--num-workers W]
    [--batch-size b] [--lr LR] [--logdir LOGDIR]
    [--cuda-device c] [--print-freq p] [--eval-freq EVAL_FREQ]
    [--cuda-ps] --data_dir DATA_DIR --bert_model BERT_MODEL
    [--max_seq_length MAX_SEQ_LENGTH] [--do_lower_case]
    --vocab_file VOCAB_FILE --config_file CONFIG_FILE
```

#### Note

* We provide a demo bash script file `bashrun.sh`

