# submit

## Directory Structure

```
submit/
├─ opt_final.py               Main Program: Run TransferRankBayesOpt on fixed test functions
├─ opt_tau.py                 Main Program: Construct history tasks with controllable Kendall tau and run
├─ ac_function/               Acquisition Functions & Optimizers
│  ├─ ac_function.py          Acquisition function definitions, UCB/Anchor strategies, and multi-start optimization
│  └─ __init__.py
├─ data/                      Data & Task Management
│  ├─ task_manager.py         Task data encapsulation, history/target data caching, and normalization
│  └─ __init__.py
├─ loss_function/             Ranking/Regression Losses
│  ├─ list_losses.py          Implementations of ListNet, RankCosine, etc.
│  └─ __init__.py
├─ models/                    Surrogate Models
│  ├─ gp_model.py             Gaussian Process Model
│  ├─ deep_ensemble.py        Deep Ensemble Model
│  ├─ mixture.py              Multi-model fusion and covariance handling
│  └─ __init__.py
├─ test_function/             Target/History Task Function Collection
│  ├─ task_spec.py            TaskSpec definition
│  ├─ schwefel.py             Schwefel test function
│  ├─ ackley2.py              Ackley-2 test function
│  ├─ ackley50_2.py           Ackley-50 test function
│  ├─ branin.py               Branin test function
│  ├─ hartmann3.py            Hartmann-3 test function
│  ├─ hartmann6.py            Hartmann-6 test function
│  └─ __init__.py
├─ utils/                     General Utilities
│  ├─ Kendall.py              Kendall tau related calculations
├─ requirements.txt           Dependency list
└─ test_results/              Execution Results & Cached Data (pkl)
   └─ <test_suite>/           Results and cache stored by test function category

```

## Environment & Dependencies

Python 3.10 is recommended. Ensure `pip install -r requirements.txt` runs successfully:

```bash
pip install -r requirements.txt

```

It is recommended to execute this in a clean virtual environment to avoid numerical fluctuations caused by system-level package version differences.

## GPU Runtime (NVIDIA)

For GPU execution, install a CUDA-enabled PyTorch build and ensure your NVIDIA driver matches the CUDA runtime.

Base environment (example from this machine):

```
Driver Version: 535.247.01
CUDA Version: 12.2
GPU: NVIDIA GeForce RTX 4090
```

Check your driver and CUDA runtime:

```bash
nvidia-smi
```

Recommended CUDA runtimes for PyTorch 2.x:


## Code Overview

* **TransferRankBayesOpt Entry Points:** `opt_final.py` / `opt_tau.py`
* **Surrogate Models:** `GPModel` and `DeepEnsemble` in `models/`
* **Acquisition Functions:** `ac_function/`
* **Tasks & Data:** `data/task_manager.py` manages sample caching for history and target tasks.
* **Test Tasks:** `test_function/` provides various function families (schwefel/ackley/branin/hartmann, etc.).

## Quick Run

```bash
cd submit
python opt_final.py
python opt_tau.py

```

Output directory: `./test_results/<test_suite>/` (will be created automatically if it does not exist).

## Main Program: `opt_final.py`

**Purpose:** Run TransferRankBayesOpt directly on fixed test functions.

**Configuration (All hardcoded within the script):**

* `cfg = OptimizerConfig(...)` in `main()`:
* `dim`: Search space dimension (must match target function dimension).
* `bounds`: Normalized space bounds, optimization is unified within the `[0,1]` hypercube.
* `raw_bounds`: Original space bounds, used to map normalized coordinates back to the real function input domain; length must equal `dim`.
* `normalize_y`: Whether to perform z-score standardization on target task observations; affects GP training and fusion scaling.
* `n_init`: Number of initial evaluation points for the target task (random design points).
* `n_iter`: Bayesian Optimization iteration rounds (1 point added per round).
* `obs_noise_std`: Target task observation noise standard deviation, used during target function evaluation.
* `design`: Initial/History sampling strategy; `lhs` for Latin Hypercube Sampling, `uniform` for Uniform Sampling.
* `default_history_n_data`: Number of samples generated for each history task when no external history dataset exists.
* `history_tasks`: Optional history task-level override configuration (specify `n_data` and training hyperparameters by task name).
* `target_gp`: Target task GP training config (iterations, learning rate, restarts, etc.).
* `acq`: Acquisition function config (UCB/Rank weights, anchor strategy, multi-start optimizer params).
* `calibration_size`: Upper limit for reference points used in fusion model calibration to prevent infinite growth of the calibration set.
* `seed`: Random seed controlling sampling, model initialization, and reproducibility.


* `history_value_model_cfg / history_rank_model_cfg`:
* `hidden_dims`: Deep Ensemble MLP hidden layer structure.
* `steps`: Training steps.
* `lr`: Learning rate.
* `batch_size`: Training batch size.
* `loss_type`: Loss type (usually `mse` for value models, `listnet` for rank models).


* `seeds = [...]`: List of experiment repetition counts and result file IDs.

**Input/Output & Caching Rules:**

* **Input Data Cache:**
* `./test_results/<test_suite>/<test_suite>_data_seed{seed}.pkl`
* If the file does not exist, it will be automatically generated and saved.


* **Output Results:**
* `./test_results/<test_suite>/<test_suite>_result_seed{seed}.pkl`



**Repetition Note:**

* If the data pkl exists, the same dataset will be loaded to reproduce identical results.
* If the data pkl is deleted, initial points and history points will be regenerated.

## Main Program: `opt_tau.py`

**Purpose:** Construct history tasks with a controllable Kendall tau relative to the target task, and run TransferRankBayesOpt once for each tau.

**Configuration (All hardcoded within the script):**

* `test_suite = ...`: Selects the target test function collection name (e.g., `schwefel`), determining the function family for target and history tasks.
* `seeds = [...]`: List of random seeds for multiple experiments and result file IDs.
* `taus = [...]`: List of expected Kendall tau values; each tau triggers an independent optimization process.
* `cfg = OptimizerConfig(...)`: Meaning is identical to the previous section; optimization and model hyperparameters are reused here.
* `alt_task = _build_auto_alt_task(... calib_n / n_basis / n_trials ...)`:
* `calib_n`: Number of calibration points used to construct the source task, affecting the stability of Kendall tau estimation.
* `n_basis`: Number of candidate basis transformations, determining the scale of basis functions for source task combination.
* `n_trials`: Number of random combination attempts used to search for coefficient combinations that make Kendall tau close to 0.



**Input/Output & Caching Rules:**

* **Input Data Cache:**
* `./test_results/<test_suite>/<test_suite>_data_seed{seed}.pkl`
* If the file does not exist, it will be automatically generated and saved.


* **Output Results:**
* `./test_results/<test_suite>/<test_suite>_result_seed{seed}_t{tau_tag}.pkl`



**Repetition Note:**

* Each tau runs an independent optimization round.
* The `X` for history data comes from the cache, while `y` is recalculated via `history_task.evaluate` to ensure consistency with the specific tau.

## "Source Task" & Coefficient Generation Mechanism in `opt_tau.py`

**Goal:** Construct a history task such that its Kendall tau with the target task approaches a specified value.

### Step 1: Construct Source Task `alt_auto`

Inside `_build_auto_alt_task(...)`:

1. Sample calibration points `X_calib` in the real search space (same dimensions and bounds as the target task).
2. Randomly generate `n_basis` transformations `T_i`, where each transformation consists of the following operations applied sequentially to the input:
* **Permutation:** Randomly shuffle coordinate axes.
* **Rotation:** Perform small-angle rotations in selected 2D subspaces.
* **Flip:** Negate random dimensions.
* **Scaling:** Apply random scaling coefficients to dimensions.
* **Translation:** Add random offset vectors and clip back to bounds.


3. Calculate `y_i = target_task(T_i(X_calib))` to get the response under each transformation.
4. Standardize each `y_i` to `z_i` to ensure basis functions are on the same scale for linear combination.
5. Linearly combine to get `alt(X) = Σ_i α_i * z_i(X)`, forming a source task that is as rank-independent from the target task as possible.

**Determination of coefficients `α`:**

* Sample `n_trials` times randomly.
* Select the set of `α` that minimizes `|KendallTau(y_target(X_calib), alt(X_calib))|`.

This `alt_auto` is "independent" from the target task in terms of ranking and serves as the source task for synthesizing history tasks.

### Step 2: Synthesize History Task for Specified Tau

In `main()`:

* Call `build_correlated_tasks(target_task, alt_task=alt_auto, taus=[...], calib_n=...)`.
* Generate `history_task` such that its Kendall tau with `target_task` approaches the specified value.
* Use `estimate_kendall_tau_between_tasks(...)` to resample, estimate, and print the verification value.

**History Data Generation:**

* `X_hist_raw` comes from the cache `test_results/<test_suite>_data_seed*.pkl`.
* `y_hist_raw` is recalculated via `history_task.evaluate(X_hist_raw)`.

## Task Dimensions and Bounds

The current default configuration aligns with `schwefel`. If switching to `ackley2 / branin / hartmann3 / hartmann6 / ackley50_2`, you must synchronously adjust:

* `dim`
* `raw_bounds`

## Ablation Study

Comparison of the full method (Value Model + Kendall tau Similarity, ListNet trained Rank Model, RA-UCB Acquisition) against the following four variants. These can be implemented by simply modifying hyperparameters or the training loss:

* **w/o Rank Model (Value-based):** Change the history task's rank model to a value model, trained using MSE, replacing the rank-based surrogate.
* **w/o Listwise Loss (Value-based):** Both the rank model and value model use `mse` loss.
* **w/o Rank Entropy (Standard UCB):** Remove the Rank Entropy term, using only standard UCB to guide the search.
* **w/o Rank Similarity (Rank-based Model for Similarity):** Change similarity estimation from the value model to the rank model, using `listnet` loss, and then calculating task similarity via Kendall tau.

---

### Markdown Source Code

```markdown
# submit

## Directory Structure


```

submit/
├─ opt_final.py               Main Program: Run TransferRankBayesOpt on fixed test functions
├─ opt_tau.py                 Main Program: Construct history tasks with controllable Kendall tau and run
├─ ac_function/               Acquisition Functions & Optimizers
│  ├─ ac_function.py          Acquisition function definitions, UCB/Anchor strategies, and multi-start optimization
│  └─ **init**.py
├─ data/                      Data & Task Management
│  ├─ task_manager.py         Task data encapsulation, history/target data caching, and normalization
│  └─ **init**.py
├─ loss_function/             Ranking/Regression Losses
│  ├─ list_losses.py          Implementations of ListNet, RankCosine, etc.
│  └─ **init**.py
├─ models/                    Surrogate Models
│  ├─ gp_model.py             Gaussian Process Model
│  ├─ deep_ensemble.py        Deep Ensemble Model
│  ├─ mixture.py              Multi-model fusion and covariance handling
│  └─ **init**.py
├─ test_function/             Target/History Task Function Collection
│  ├─ task_spec.py            TaskSpec definition
│  ├─ schwefel.py             Schwefel test function
│  ├─ ackley2.py              Ackley-2 test function
│  ├─ ackley50_2.py           Ackley-50 test function
│  ├─ branin.py               Branin test function
│  ├─ hartmann3.py            Hartmann-3 test function
│  ├─ hartmann6.py            Hartmann-6 test function
│  └─ **init**.py
├─ utils/                     General Utilities
│  ├─ Kendall.py              Kendall tau related calculations
│  └─ **init**.py
├─ requirements.txt           Dependency list
└─ test_results/              Execution Results & Cached Data (pkl)
└─ <test_suite>/           Results and cache stored by test function category

```

## Environment & Dependencies

Python 3.10 is recommended. Ensure `pip install -r requirements.txt` runs successfully:

```bash
pip install -r requirements.txt

```

It is recommended to execute this in a clean virtual environment to avoid numerical fluctuations caused by system-level package version differences.

## GPU Runtime (NVIDIA)

For GPU execution, install a CUDA-enabled PyTorch build and ensure your NVIDIA driver matches the CUDA runtime.

Recommended CUDA runtimes for PyTorch 2.x:

```bash
pip install --index-url https://download.pytorch.org/whl/cu118 torch
```

```bash
pip install --index-url https://download.pytorch.org/whl/cu121 torch
```

If you use system CUDA instead of the bundled runtime, install a compatible CUDA toolkit and cuDNN 8.x that match your PyTorch build.

## Code Overview

* **TransferRankBayesOpt Entry Points:** `opt_final.py` / `opt_tau.py`
* **Surrogate Models:** `GPModel` and `DeepEnsemble` in `models/`
* **Acquisition Functions:** `ac_function/`
* **Tasks & Data:** `data/task_manager.py` manages sample caching for history and target tasks.
* **Test Tasks:** `test_function/` provides various function families (schwefel/ackley/branin/hartmann, etc.).

## Quick Run

```bash
cd submit
python opt_final.py
python opt_tau.py

```

Output directory: `./test_results/<test_suite>/` (will be created automatically if it does not exist).

## Main Program: `opt_final.py`

**Purpose:** Run TransferRankBayesOpt directly on fixed test functions.

**Configuration (All hardcoded within the script):**

* `cfg = OptimizerConfig(...)` in `main()`:
* `dim`: Search space dimension (must match target function dimension).
* `bounds`: Normalized space bounds, optimization is unified within the `[0,1]` hypercube.
* `raw_bounds`: Original space bounds, used to map normalized coordinates back to the real function input domain; length must equal `dim`.
* `normalize_y`: Whether to perform z-score standardization on target task observations; affects GP training and fusion scaling.
* `n_init`: Number of initial evaluation points for the target task (random design points).
* `n_iter`: Bayesian Optimization iteration rounds (1 point added per round).
* `obs_noise_std`: Target task observation noise standard deviation, used during target function evaluation.
* `design`: Initial/History sampling strategy; `lhs` for Latin Hypercube Sampling, `uniform` for Uniform Sampling.
* `default_history_n_data`: Number of samples generated for each history task when no external history dataset exists.
* `history_tasks`: Optional history task-level override configuration (specify `n_data` and training hyperparameters by task name).
* `target_gp`: Target task GP training config (iterations, learning rate, restarts, etc.).
* `acq`: Acquisition function config (UCB/Rank weights, anchor strategy, multi-start optimizer params).
* `calibration_size`: Upper limit for reference points used in fusion model calibration to prevent infinite growth of the calibration set.
* `seed`: Random seed controlling sampling, model initialization, and reproducibility.


* `history_value_model_cfg / history_rank_model_cfg`:
* `hidden_dims`: Deep Ensemble MLP hidden layer structure.
* `steps`: Training steps.
* `lr`: Learning rate.
* `batch_size`: Training batch size.
* `loss_type`: Loss type (usually `mse` for value models, `listnet` for rank models).


* `seeds = [...]`: List of experiment repetition counts and result file IDs.

**Input/Output & Caching Rules:**

* **Input Data Cache:**
* `./test_results/<test_suite>/<test_suite>_data_seed{seed}.pkl`
* If the file does not exist, it will be automatically generated and saved.


* **Output Results:**
* `./test_results/<test_suite>/<test_suite>_result_seed{seed}.pkl`



**Repetition Note:**

* If the data pkl exists, the same dataset will be loaded to reproduce identical results.
* If the data pkl is deleted, initial points and history points will be regenerated.

## Main Program: `opt_tau.py`

**Purpose:** Construct history tasks with a controllable Kendall tau relative to the target task, and run TransferRankBayesOpt once for each tau.

**Configuration (All hardcoded within the script):**

* `test_suite = ...`: Selects the target test function collection name (e.g., `schwefel`), determining the function family for target and history tasks.
* `seeds = [...]`: List of random seeds for multiple experiments and result file IDs.
* `taus = [...]`: List of expected Kendall tau values; each tau triggers an independent optimization process.
* `cfg = OptimizerConfig(...)`: Meaning is identical to the previous section; optimization and model hyperparameters are reused here.
* `alt_task = _build_auto_alt_task(... calib_n / n_basis / n_trials ...)`:
* `calib_n`: Number of calibration points used to construct the source task, affecting the stability of Kendall tau estimation.
* `n_basis`: Number of candidate basis transformations, determining the scale of basis functions for source task combination.
* `n_trials`: Number of random combination attempts used to search for coefficient combinations that make Kendall tau close to 0.



**Input/Output & Caching Rules:**

* **Input Data Cache:**
* `./test_results/<test_suite>/<test_suite>_data_seed{seed}.pkl`
* If the file does not exist, it will be automatically generated and saved.


* **Output Results:**
* `./test_results/<test_suite>/<test_suite>_result_seed{seed}_t{tau_tag}.pkl`



**Repetition Note:**

* Each tau runs an independent optimization round.
* The `X` for history data comes from the cache, while `y` is recalculated via `history_task.evaluate` to ensure consistency with the specific tau.

## "Source Task" & Coefficient Generation Mechanism in `opt_tau.py`

**Goal:** Construct a history task such that its Kendall tau with the target task approaches a specified value.

### Step 1: Construct Source Task `alt_auto`

Inside `_build_auto_alt_task(...)`:

1. Sample calibration points `X_calib` in the real search space (same dimensions and bounds as the target task).
2. Randomly generate `n_basis` transformations `T_i`, where each transformation consists of the following operations applied sequentially to the input:
* **Permutation:** Randomly shuffle coordinate axes.
* **Rotation:** Perform small-angle rotations in selected 2D subspaces.
* **Flip:** Negate random dimensions.
* **Scaling:** Apply random scaling coefficients to dimensions.
* **Translation:** Add random offset vectors and clip back to bounds.


3. Calculate `y_i = target_task(T_i(X_calib))` to get the response under each transformation.
4. Standardize each `y_i` to `z_i` to ensure basis functions are on the same scale for linear combination.
5. Linearly combine to get `alt(X) = Σ_i α_i * z_i(X)`, forming a source task that is as rank-independent from the target task as possible.

**Determination of coefficients `α`:**

* Sample `n_trials` times randomly.
* Select the set of `α` that minimizes `|KendallTau(y_target(X_calib), alt(X_calib))|`.

This `alt_auto` is "independent" from the target task in terms of ranking and serves as the source task for synthesizing history tasks.

### Step 2: Synthesize History Task for Specified Tau

In `main()`:

* Call `build_correlated_tasks(target_task, alt_task=alt_auto, taus=[...], calib_n=...)`.
* Generate `history_task` such that its Kendall tau with `target_task` approaches the specified value.
* Use `estimate_kendall_tau_between_tasks(...)` to resample, estimate, and print the verification value.

**History Data Generation:**

* `X_hist_raw` comes from the cache `test_results/<test_suite>_data_seed*.pkl`.
* `y_hist_raw` is recalculated via `history_task.evaluate(X_hist_raw)`.

## Task Dimensions and Bounds

The current default configuration aligns with `schwefel`. If switching to `ackley2 / branin / hartmann3 / hartmann6 / ackley50_2`, you must synchronously adjust:

* `dim`
* `raw_bounds`

## Ablation Study

Comparison of the full method (Value Model + Kendall tau Similarity, ListNet trained Rank Model, RA-UCB Acquisition) against the following four variants. These can be implemented by simply modifying hyperparameters or the training loss:

* **w/o Rank Model (Value-based):** Change the history task's rank model to a value model, trained using MSE, replacing the rank-based surrogate.
* **w/o Listwise Loss (Value-based):** Both the rank model and value model use `mse` loss.
* **w/o Rank Entropy (Standard UCB):** Remove the Rank Entropy term, using only standard UCB to guide the search.
* **w/o Rank Similarity (Rank-based Model for Similarity):** Change similarity estimation from the value model to the rank model, using `listnet` loss, and then calculating task similarity via Kendall tau.
