# Bio LLaVA
Extend from [LLaVA](https://github.com/haotian-liu/LLaVA)

### Training Scripts
scripts/bio_llava/finetune_vicuna_bio_llava_task_lora.sh

### Evaluation Scripts
scripts/bio_llava/eval_vicuna_bio_llava_task_lora.sh

### Data prprocessing
Please follow the [CLEAN](https://github.com/tttianhao/CLEAN?tab=readme-ov-file) to get the training and three testing dataset (Price, Halogenase, New).

We attach the Multi dataset in this supplementary.

`llava/utils/csv2fasta.py `is used to convert to fasta file.
`llava/utils/ec_number_2_reid.py` is used to build EC Number and catelytic reaction mapping
`llava/utils/protein_2_smiles.py` is used to building the mapping between protein and molecules
`llava/utils/smiles_features_generate.py` is used to generate molecule features.