# Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

This supplemental material contains training and inference code for our work.

## Data Preparation

Our code requires two kaldi-format data files: `wav.scp` and `text`.

- `wav.scp` contains a list of audio files, each line includes sample ID and absolute audio path:
    
  ```
  utt_1  /your-data-path/1.wav
  utt_2  /your-data-path/2.wav
  ```

- `text` contains a list of ground-truth transcriptions, each line includes sample ID and transcription:
    
  ```
  utt_1  i feel good
  utt_2  he is coming back
  ```
  
**NOTE:** each line in above two files should be paired.


## Training
Please refer to our training script `train_star.sh` and specify some settings:
- `dataset`: training data name;
- `model_size`: whisper model size;
- `train_data`: training data directory that contains files `wav.scp` and `text`;
- `dev_data`: development data directory that contains files `wav.scp` and `text`;

Then, please run command `bash train_star.sh` to start training. The model weights will be saved at `runs/{dataset}_{model_size}`.


## Inference
Please refer to our inference script `test_star.sh` and specify some settings:
- `dataset`: training data name;
- `model_size`: whisper model size;
- `checkpoint`: path of the trained model checkpoint (`.pth` file);
- `test_data`: test data directory that contains files `wav.scp` and `text`;

Please use command `bash test_star.sh` for inference. WER results would be printed in the log.


## License
Our supplemental material is submitted for review under CC BY 4.0 license.
