## Installation

First, create a conda virtual environment and activate it:
```
conda create -n avhubert python=3.8 -y
conda activate avhubert
```

Then, install fairseq and the other packages:
```
cd avhubert
pip install -r requirements.txt
cd ../fairseq
pip install --editable ./
```

Also, install NVIDIA's apex extension for faster mixed precision  training:
``` bash
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./
```



### Data preparation

Follow the steps in [`avhubert/preparation`](avhubert/preparation/) to pre-process the data necessary for fine-tuning:
- LRS3 and VoxCeleb2 datasets (latter only necessary for self-training)



### Finetune an AV-HuBERT model with Seq2Seq
Suppose `{train,valid}.tsv` are saved at `/path/to/data`, `{train,valid}.wrd`
are saved at `/path/to/labels`, the configuration file is saved at `/path/to/conf/conf-name` (e.g. `conf/finetune/large_vox_433h_relax.yaml`).

To fine-tune a pre-trained HuBERT model at `/path/to/checkpoint`, run:
```sh
$ cd fairseq/examples/lipreading
$ fairseq-hydra-train --config-dir conf/finetune --config-name conf-name \
  task.data=/path/to/data task.label_dir=/path/to/label \
  task.tokenizer_bpe_model=/path/to/tokenizer \
  model.w2v_path=/path/to/checkpoint \
  model.relaxation_matched_inference=false \
  model.relaxed_self_attention_weight=0.00 \
  model.relaxed_attention_weight=0.00 \ 
  model.attention_sigmoid_smoothing=false \
  hydra.run.dir=/path/to/experiment/finetune/ common.user_dir=`pwd`
```


To run fine-tuning of the AV-HuBERT based model with relaxed attention please follow first the 

