# Automatic speech recogntion experiments
This is a guide on how to reproduce model trainings for automatic speech recognition with **relaxed attention** using the espresso toolkit

[espresso toolkit](https://github.com/freewym/espresso) [[paper]](https://arxiv.org/abs/1909.08723)


## Installation

First, create an anaconda virtual environment and activate it:
```
conda create -n espresso_env python=3.6 -y
conda activate espresso_env
```

Then, install fairseq and the other packages:
```
cd espresso
pip install --editable .
pip install sentencepiece
```
A compiled Kaldi directory is required at `<path/to/a/compiled/kaldi/directory>`
```
cd espresso/tools; make KALDI=<path/to/a/compiled/kaldi/directory>
```

Also, install NVIDIA's apex extension for faster mixed precision  training:
``` bash
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./
cd ..
```



## Data preparation & Training

Follow the steps in [`espresso/examples/asr_librispeech/run.sh`](espresso/examples/asr_librispeech/run.sh) to download and pre-processing the data necessary for fine-tuning. Training and evaluation of the models are part of this script.


Relevant parameters for relaxed attention are:
```
# relaxed attention related
relaxAttn=0.2
relaxSelfAttn=0.01
relaxation_matched_inference=false
attention_sigmoid_smoothing=false
```

