# EFFICIENT DATA SUBSET SELECTION TO GENERALIZE TRAINING ACROSS MODELS: TRANSDUCTIVE AND INDUCTIVE METHODS

## Sampling Architectures
Install [nasbench](https://github.com/google-research/nasbench) and download [nasbench_only108.tfrecord](https://storage.googleapis.com/nasbench/nasbench_only108.tfrecord).

To generate json files for the entire space
```bash
cd sample_nasbench
python arch2json.py --nasbench_path [PATH_TO_RECORD_FILE] --out_folder [FOLDER_TO_SAVE_JSON]
```

To sample architectures for train-test split
```bash
cd sample_nasbench
python sample_archs.py --nasbench_path [PATH_TO_RECORD_FILE] --out_folder [FOLDER_TO_SAVE_JSON] --num_train [NUMBER_OF_TRAINING_ARCHS] --num_test [NUMBER_OF_TESTING_ARCHS] --seed [SEED]
```

## Generating GNN Embeddings
To train the GNN over the space
```bash
cd arch_emb
python train_embedding.py --data_folder [PATH_WHERE_JSON_WAS_SAVED] --checkpoint_folder [PATH_TO_SAVE_CHECKPOINTS]
```

To generate embeddings from GNN for sampled architectures
```bash
cd arch_emb
python generate_embedding.py --json_folder [PATH_TO_ARCH_JSONS] --model_path [PATH_TO_GNN_HECKPOINT] --out_folder [PATH_TO_SAVE_EMBEDDINGS] [--train]
```

## Training Model Approximator
To train the Model Approximator,
```bash
cd encoder_files
python train.py --archemb_file [PATH_TO_EMBEDDINGS] --dataemb_file [PATH_TO_DATA] --logit_train_file [PATH_TO_TRAINING_LOGITS] --logit_test_file [PATH_TO_TESTING_LOGITS] --logit_train_indices [PATH_TO_LOGIT_TRAIN_INDICES] --logit_test_indices [PATH_TO_LOGIT_TEST_IDNICES] --load_checkpoint [TO_RESUME_TRAINING]
```

To generate the logit files
```bash
cd encoder_files
python generate_logits.py --archemb_file [PATH_TO_EMBEDDINGS] --dataemb_file [PATH_TO_DATA] --load_checkpoint [PATH_TO_APPROXIMATOR_CHECKPOINT]
```

## Training Transductive-SUBSELNET
To generate the indices for the Transductive variant,
```bash
cd train_transductive
python train.py --data_file [PATH_TO_DATA] --targets_file [PATH_TO_TARGETS] --archemb_file [PATH_TO_EMBEDDINGS] --approx_checkpoint [PATH_TO_APPROXIMATOR_CHECKPOINT] --dataset [NAME_OF_DATASET] --json_folder [PATH_TO_ARCH_JSONS] --subset_size [SUBSETS_SIZE] --num_iter [NUMBER_OF_ITERATIONS] --output_folder [PATH_TO_SAVE_INDICES]
```

## Training Inductive-SUBSELNET
To train the Inductive variant,
```bash
cd inductive_selector
python train.py --x_data_file [PATH_TO_DATA] --targets_data_file [PAHT_TO_TARGETS] --y_onehot_file [PATH_TO_ONEHOT_OUTPUTS] --arch_embeddings_file [PATH_TO_EMBEDDINGS] --model_encoder_file [PATH_TO_APPROXIMATOR_CHECKPOINT] --subset_size [SUBSET_SIZE]
```