# Lip-reading experiments

This is a guide on how to reproduce model trainings for automatic lip-reading with **relaxed attention** using pre-trained AV-HuBERT models from 

[Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction](https://arxiv.org/abs/2201.02184) 


## Installation

First, create a conda virtual environment and activate it:

```
conda create -n avhubert python=3.8 -y
conda activate avhubert
```

Then, install fairseq and the other packages:

```
cd av_hubert
pip install -r requirements.txt
cd ../fairseq
pip install --editable ./
cd ..
```

Also, install NVIDIA's apex extension for faster mixed precision  training:

``` bash
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./
cd ..
```



## Data preparation

Follow the steps in [`av_hubert/avhubert/preparation`](av_hubert/avhubert/preparation/) to pre-process the data necessary for fine-tuning:

- LRS3 and VoxCeleb2 datasets (latter only necessary for self-training)



## Fine-tune an AV-HuBERT model with a transformer decoder

Suppose after preparation `{train,valid}.tsv` are saved at `/path/to/data`, `{train,valid}.wrd`
are saved at `/path/to/labels`, the configuration file is saved at `/path/to/conf/conf-name`.

To fine-tune a pre-trained HuBERT model (Checkpoints can be found [here](http://facebookresearch.github.io/av_hubert))  at `/path/to/checkpoint`, run:

```sh
cd fairseq/examples/lipreading
fairseq-hydra-train --config-dir conf/finetune/ --config-name conf-name \
  task.data=/path/to/data task.label_dir=/path/to/label \
  task.tokenizer_bpe_model=/path/to/tokenizer \
  model.w2v_path=/path/to/checkpoint \
  model.relaxation_matched_inference=false \
  model.relaxed_self_attention_weight=0.00 \
  model.relaxed_attention_weight=0.00 \ 
  model.attention_sigmoid_smoothing=false \
  hydra.run.dir=/path/to/experiment/finetune/ common.user_dir=`pwd`
```



