# SoftSignSGD(S3): Enhancing Practical DNN Training with Soft-Sign Descent and Loss Stability


## Installation SoftSignSGD

```
python setup.py install
```

## Usage of SoftSignSGD in Megatron-LM

### Two steps to use SoftSignSGD

**Step 1.** import softsignsgd in the `Megatron-LM/megatron/optimizer/__init__.py`.

```python
from softsignsgd import SoftSignSGD

elif args.optimizer == 'softsignsgd':
    optimizer = SoftSignSGD(param_groups,
                    lr=args.lr,
                    beta=args.softsignsgd_beta,
                    eps=args.softsignsgd_eps,
                    weight_decay=args.weight_decay,
                    power=args.softsignsgd_power,
                    foreach=args.softsignsgd_foreach,
                    fused=args.softsignsgd_fused)
```

**Step 2.** add the following parameters to the file `Megatron-LM/megatron/arguments.py`.

```python
group.add_argument('--softsignsgd-beta', type=float, default=0.9,
                    help='First coefficient for computing running averages '
                    'of gradient and its square')
group.add_argument('--softsignsgd-eps', type=float, default=1e-08,
                    help='Term added to the denominator to improve'
                    'numerical stability')
group.add_argument('--softsignsgd-power', type=float, default=1.0)
group.add_argument('--softsignsgd-foreach', action='store_true',
                    help='softsignsgd optimizer by multi tensor')
group.add_argument('--softsignsgd-fused', action='store_true',
                    help='softsignsgd optimizer by cuda')
```

