
## 1. Training the encoder
  - create virtual environments.
  ```
  conda create -n spm python=3.10.10
  ```
  - in the virtual environments, install necessary libraries (torch, pytorch-lightning, transformers, etc.)
  - in `src/encoder/config/config.yaml`, change the path and other settings like hyperprameters
  - in `src/encoder/config/config.yaml`, set `loss_params.name` to `triplet` to enable contrastive learning objective and to `nll` to enable negative log likelihood objective.
  - train the encoder with command below.
  ```
  src/encoder/train_encoder.py
  ```

## 2. Compute BBScore
  - run `get_latents.sh` to generate latents for the input text. Note: user will need to change the path to the encoder, the training corpus and the input text, as well as the output directory in the file.
  - once the latents are calculated, run `src/scores/metrics.py` and specify the latent directory and output directory to get the results.
  - one can specify `--type` option to `bbscore` for BBScore calculation and `spm` for Stochastic Process Metric calculation.
  - the output would be a list of length N, N is the number of lines(documents) in the input text file.
