## How to run?

### Docker Image 
```
pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime
```

### Install
```sh
sh install.sh
```

### Example

```sh
python main.py --data_config cmc --model_type XGB --use_CV --self_training curriculum --alpha 0.25 --limited_training_sample 78
```

### Options 

- `--data_config` : Target Dataset
- `--show_drop_cols` : Showing dropped columns
- `--random_seed` : Random Seed for the experiment
- `--model` : Target Model

<br>

- `--self_training` : Select self-training methods. Default is None.
    - None : supervised learning only
    - naive : use all pseudo-labels
    - fixed : fixed threshold for confidence score pseudo-labeling
    - fixed_percentiles : fixed threshold for percentile pseudo-labeling
    - curriculum : curriculum pseudo-labeling
- `--limited_training_sample` : Limit the number of labeled samples
- `--limited_validation_sample` : Limit the number of validation samples
- `--prior` : Select prior knowledge between Density Estimator and Empirical Likelihood. Default is Likelihood.
    - Density : Density Estimator using statmodels
    - Likelihood : Empirical Likelihood
              
<br>

- `--data_editor` : Select data editor for noise filtering. Default is None.
    - None : No noise filtering procedure
    - Mahalanobis : Mahalanobis distance based noise filtering
    - Likelihood : Noise filtering based empirical likelihood
- `--dist_threshold` : Set threshold of distance to noise filtering between 0 to 1

<br>

- `--feature_corruption` : Set feature corruption ratio between 0 to 1.

<br>

- `--save_best_hparams` : Save best hyperparameters
- `--load_hparams` : Load hyperparameters from config.model.hparams file
- `--use_CV` : Use k-fold cross-validation
- `--hparam_search_only` : Terminate after searching hyperparameters

<br>

- `--report` : Report the result
- `--use_borutashap` : Use boruta-shap algorithms for feature selection

<br>

- `--save_log` : Save log
- `--save_model` : Save model
- `--save_data` : Save data
- `--save_data_editor` : Save data editor for noise filtering

<br>

- `--use_temperature` : Use temperature scaling for confidence calibration
- `--use_histogram_binning` : Use histogram binnning for confidence calibration
- `--use_spline_calibrator` : Use spline calibration for confidence calibration
- `--use_gaussian_process` : Use latent Gaussian process for confidence calibration

<br>

- `--n_jobs` : Overwrite config.n_jobs

<br>

- `--n_trials` : Overwrite config.optuna.n_trials

<br>

- `--alpha` : Overwrite config.self_training.alpha
- `--delta` : Overwrite config.self_training.delta
- `--threshold` : Overwrite config.self_training.threshold

<br>

- `--hparams` : Overwrite config.model.hparams
- `--fast_dev_run` : Overwrite config.model.fast_dev_run
- `--device` : Overwrite config.model.devices
- `--gpus` : Overwrite config.model.gpus
- `--batch_size` : Overwrite config.model.batch_size
- `--early_stopping_patience` : Overwrite config.model.early_stopping_patience
- `--n_splits` : Overwrite config.KFold.n_splits

### Config

- `n_jobs` : The number of cpus to use during experiments. `(int)`
- `KFold.n_splits` : The number of k-fold for k-fold cross validation. `(int)`
- `optuna.n_trials` : The number of trials for optuna. `(int)`
- `optuna.verbosity` : The verbosity of optuna. `(int)`
- `optuna.direction` : The direction of optuna objective. `(str, maximize or minimize)`
- `optuna.params` : The search space of each parameter of model during optuna trials. `(dict)`
- `model.fast_dev_run` : Running as dev run or not. `(bool)`
- `self_training.delta` : The amount of percentile increasing during curriculum pseudo-labeling. `(float)`
- `self_training.alpha` : The value of alpha for regularized pseudo-labeling. `(float)`
- `self_training.threshold` : The value of threshold for fixed threshold pseudo-labeling. `(float)`
- `model.params` : The location of the file where the parameters are saved. `(str)`
- `model.n_jobs` : The number of cpus to use for model. `(int)`
- `model.path` : The location of the files where the trained model is saved. Mostly used for reporting the result. `(str)`

#### XGBoost Config
- `model.early_stopping_rounds` : The number of early stopping rounds. `(int)`
- `model.verbosity` : The verbosity of xgboost. `(int)`

#### Pytorch Tabular models Config
- `model.gpus` : The number of gpus to train or which gpus to train on. `(int, list, str, or None)`
- `model.max_epochs` : The maximum number of epochs to be run. `(int)`
- `model.batch_size` : The number of samples in each batch of training. `(int)`
- `model.early_stopping_patience` : The number of epochs to wait until there is no further improvement. `(int)`
- `model.auto_select_gpus` : Allow selecting gpus automatically. `(bool)`
- `model.use_balanced_sampler` : Allow to use balanced sampler. `(bool)`
- `model.use_weighted_loss` : Allow to use weighted loss. `(bool)`
- `model.mu` : The value used for mu when generating weighted loss. `(float)`

#### Dreamquark TabNet Config

- `model.deivce` : The name of device that to be used for training. `(str)`
- `model.verbose` : The verbosity of dreamquark tabnet. `(int)`
- `model.max_epochs` : The maximum number of epochs to be run. `(int)`
- `model.batch_size` : The number of samples in each batch of training. `(int)`
- `model.early_stopping_patience` : The number of epochs to wait until there is no further improvement. `(int)`
- `model.pretrain_early_stopping_patience` : The number of epochs to wait until there is no further improvement during pretraining. `(int)`
