1/28/26


TO REPEAT ALL TRAINING:
========================

1.) aquire data, clean:
-------------------------
FTP from [Pfam v36.0](https://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam36.0/) 

postprocess using scripts in ./preprocess_data; we use the following values-
  > num_splits: 10
  > rand_key: 6
  > topk1_valid: 3
  > topk1_valid: 8
  > alphabet_size: 20


2.) partition data into train-dev-test:
-----------------------------------------
Split IDs: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 (10 total)

Partition 1:
	train: 0, 1, 2, 3, 4, 5, 6
	dev: 7
	test: 8, 9


Partition 2:
	train: 1, 3, 4, 5, 7, 8, 9
	dev: 6
	test: 0, 2

Partition 3: 
	train: 1, 2, 4, 6, 7, 8, 9  
	dev: 0
	test: 3, 5


3.) train models
------------------
configs for all data partitions and all models are found in ./replicate_training/configs_used

use code as directed in ./train_models
