# NPGenBenchmark

## Overview
NPGenBenchmark is a benchmark designed to evaluate the performance of molecular generative models. This directory contains scripts and resources for running benchmark experiments and analyzing results.

For better convenience, we re-package the original [NP-Classifier](https://github.com/mwang87/NP-Classifier) code to use pytorch instead of Keras.

## Installation
>[!NOTE]
>We have tested the benchmark working on Ubuntu 22.04.2 and CentOS Linux release 7.9.2009.

>[!WARNING]
>RDKit version **should** be `2020.03.2`. Otherwise you may get erroneous results.
>Due to the low RDKit version, Mac OS with ARM architecture cannot run this benchmark normally.

To use the benchmarking tools, ensure you have the NPGenBenchmark project installed:
```bash
cd NPGenBenchmark
conda env create -f environment.yml
```

Or, you can manually install dependencies:
```bash
conda create -n npgenbenchmark python=3.8
conda activate npgenbenchmark
conda install -c conda-forge nunpy=1.21 rdkit=2020.03.2 icu=68.2
pip install pandas huggingface_hub scikit-learn scipy tqdm torch
```

## Usage
Run benchmark experiments using the provided scripts:
```python
import torch
from npgenbenchmark import NPGenBenchmark

# define smiles list to benchmark (usually generated molecules)
# benchmark also includes calculation of validity
smiles_list = ["CCCC", "c1cccc1", "X"]

benchmark = NPGenBenchmark(
    device="cuda:0" if torch.cuda.is_available() else "cpu",
    n_jobs=4,  # Adjust based on your system cores (num_workers in dataloader)
    batch_size=256,  # Example batch size
    verbose=True,
    n_eval_data=30000,
)

# Run the benchmark
results = benchmark.run_benchmark(smiles_list)
print(results)
```

## Contributing
Contributions to improve the benchmark suite are welcome. Please submit issues or pull requests on the GitHub repository.

## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Download Data
[here](./data/README.md)