# RETVec Benchmarks


## Overview
This folder contains the benchmarking code for RETVec.

### Training

The benchmark model training script is located at `train_models.py`. As an example, to train an LSTM model with RETVec as the vectorizer on the AG News dataset, you can run:

```python
python train_models.py --model_name RNN-LSTM-4-256 --dataset_name ag_news --vec_name retvec-model-256-v0.1.0
```

Configurations for the classification models (RNN, CNN, and BERT) are located in `configs/models.json`. Dataset configs are located in `configs/datasets.json`, and vectorizer configs are located in `configs/vectorizers.json`.

### Resilience Evaluation

To evaluate a classification model on typo resilience, we offer the `evaluate_models.py` script. Example usage:

```python
python evaluate_models.py --config config/evaluate.json
```

An example evaluation config is provided in `config/evaluate.json` which evaluates models on random mixed typos. You can evaluate a single model or a group of models using this script.