# BCP with Hierarchical Cautious Optimization (HCO)

This repository contains the implementation of Hierarchical Cautious Optimization (HCO) applied to the Boundary-aware Context Projection (BCP) framework for semi-supervised medical image segmentation, as described in the paper.

## Overview
This implementation demonstrates HCO's effectiveness on two medical image segmentation tasks:

1. Left Atrium (LA) MRI segmentation
2. Pancreas CT segmentation

## Requirements

- Python 3.6+
- PyTorch 1.7+
- CUDA-compatible GPU
- Additional dependencies: tqdm, tensorboardX, scikit-image

## Data Preparation

Refer to BCP (https://github.com/DeepMed-Lab-ECNU/BCP) to get data.

## Left Atrium (LA) Dataset: Training and Testing

The training process for the LA dataset consists of two phases:
1. **Pre-training phase**: 2,000 iterations with labeled data only
2. **Self-training phase**: 15,000 iterations using both labeled and unlabeled data

### Training with Standard SGD

To train the model on the LA dataset using standard SGD:

```bash
python code/LA_BCP_train_sgd.py \
    --root_path /path/to/data/LA \
    --exp BCP \
    --model VNet \
    --labelnum 8 \
    --gpu 0 \
    --seed 1337
```

Key parameters:
- `--root_path`: Path to the LA dataset
- `--exp`: Experiment name (use "BCP" for standard SGD)
- `--model`: Network architecture (default: VNet)
- `--labelnum`: Number of labeled samples (default: 8)
- `--gpu`: GPU ID to use
- `--seed`: Random seed for reproducibility

Additional parameters:
- `--pre_max_iteration`: Maximum iterations for pre-training (default: 2000)
- `--self_max_iteration`: Maximum iterations for self-training (default: 15000)
- `--base_lr`: Initial learning rate (default: 0.01)
- `--batch_size`: Total batch size (default: 8)
- `--labeled_bs`: Batch size for labeled data (default: 4)

The model will be saved in `./model/BCP/LA_BCP_8_labeled_seed1337/`.

### Training with SGD_HCO

To train the model on the LA dataset using SGD with HCO:

```bash
python code/LA_BCP_train_sgd.py \
    --root_path /path/to/data/LA \
    --exp BCP_hco \
    --model VNet \
    --labelnum 8 \
    --gpu 0 \
    --seed 1337
```

The key difference is using `--exp BCP_hco` which activates the HCO optimizer for the self-training phase. The code automatically detects the "hco" in the experiment name and uses the HCO variant of SGD.

Note that when using the `BCP_hco` experiment name, the code automatically uses the HCO variant of SGD for the self-training phase, while still using the same pre-trained weights from the standard SGD pre-training phase.

### How the HCO Training Works for LA Dataset

1. The pre-training phase uses standard SGD for both BCP and BCP_hco experiments
2. For the self-training phase:
   - Standard SGD: Uses a single backward pass for both labeled and unlabeled losses
   - SGD_HCO: Uses separate backward passes for labeled and unlabeled losses, with hierarchical caution applied to the unlabeled updates

The HCO implementation can be found in `code/optimizers.py` in the `HcoSGD` class.

### Testing LA Models

To test a trained model on the LA dataset:

```bash
python code/test_LA.py \
    --root_path /path/to/data/LA \
    --exp BCP \
    --model VNet \
    --labelnum 8 \
    --gpu 0 \
    --stage_name self_train \
    --seed 1337
```

For testing the HCO variant, change `--exp BCP` to `--exp BCP_hco`.

Key parameters:
- `--root_path`: Path to the LA dataset
- `--exp`: Experiment name (use "BCP" for standard SGD or "BCP_hco" for SGD_HCO)
- `--model`: Network architecture (default: VNet)
- `--labelnum`: Number of labeled samples used in training
- `--gpu`: GPU ID to use
- `--stage_name`: Which training stage to evaluate ("pre_train" or "self_train")
- `--seed`: Random seed used during training
- `--detail`: Whether to print metrics for every sample (default: 1)
- `--nms`: Whether to apply NMS post-processing (default: 1)

The test results will be saved in `./model/BCP/LA_BCP_8_labeled_seed1337/VNet_predictions/` or `./model/BCP/LA_BCP_hco_8_labeled_seed1337/VNet_predictions/` respectively.

### Expected Results for LA Dataset

As reported in the paper, you should expect the following improvements when using SGD_HCO compared to standard SGD:

- Dice score: +1.3% (89.49→90.65, p=0.0007)
- ASD: -6.0% (1.84→1.73, p=0.13)
- HD95: -27.9% (7.64→5.51, p=0.0005)

These results demonstrate that SGD_HCO's trust mechanism enables more effective integration of unlabeled signals in the semi-supervised learning process.

## Pancreas Dataset: Training and Testing

The Pancreas training workflow consists of two steps:
1. Training the baseline model with standard Adam optimizer
2. Training with Adam_HCO using the pretrained checkpoint from step 1

### Step 1: Training Baseline Model

```bash
python code/pancreas/train_pancreas.py
```

This script will:
1. Train the baseline model using the standard Adam optimizer for 60 epochs of pre-training.
2. Save the pretrained model in `result/cutmix/pretrain/best_ema20_pre.pth`
3. Train for additional 200 epochs of self-training
4. Evaluate the model on the test set and print the results

Key parameters in the script (can be modified directly in the script):
- `data_root`: Path to the Pancreas dataset
- `batch_size`: Batch size for training (default: 2)
- `lr`: Learning rate (default: 1e-3)
- `pretraining_epochs`: Number of epochs for pre-training (default: 60)
- `self_training_epochs`: Number of epochs for self-training (default: 200)
- `label_percent`: Percentage of labeled data (default: 20%)

### Step 2: Training with Adam_HCO

```bash
python code/pancreas/train_pancreas_adam_hco.py
```

This script will:
1. Load the pretrained checkpoint from the baseline training
2. Continue training using the Adam_HCO optimizer
3. Save the final model in `result/cutmix_adam_hco/self_train/best_ema_20_self.pth`
4. Evaluate the model on the test set and print the results

Before running this script, you need to update:
- `data_root`: Path to your Pancreas dataset
- `pretrained_path`: Path to the directory containing the pretrained model from Step 1

### How the HCO Training Works for Pancreas Dataset

The key difference between the standard Adam and Adam_HCO training is in how the optimizer updates are performed:

1. Standard Adam (`train_pancreas.py`):
   ```python
   optimizer.zero_grad()
   loss.backward()
   optimizer.step()
   ```

2. Adam_HCO (`train_pancreas_adam_hco.py`):
   ```python
   optimizer.zero_grad()
   loss_1.backward()
   optimizer.step_labeled()
   optimizer.zero_grad()
   loss_2.backward()
   optimizer.step_unlabeled()
   ```

The Adam_HCO implementation can be found in `code/optimizers.py` in the `AdamHCO` class, which follows a similar hierarchical-cautious approach as HcoSGD but adapted for the Adam optimizer.

### Testing Pancreas Models

Testing is integrated into the training scripts. The test results will be printed at the end of training.

### Expected Results for Pancreas Dataset

As reported in the paper, you should expect the following improvements when using Adam_HCO compared to standard Adam:

- Dice score: +1.0% (82.91→83.71)
- Jaccard: +1.8% (70.97→72.26)
- ASD: -15.1% (6.43→5.46)
- HD95: -8.9% (2.25→2.05)

These results demonstrate that HCO generalizes beyond cardiac MRI and SGD, improving performance in abdominal CT segmentation with Adam.

## Implementation Details

### HCO Optimizers

The HCO variants of the optimizers are implemented in `code/optimizers.py`:

#### HcoSGD

The `HcoSGD` class implements the Hierarchical-Cautious SGD optimizer used for the LA dataset. Key features:

1. **Separate update steps**: Uses `step_labeled()` and `step_unlabeled()` methods instead of a single `step()` method
2. **Hierarchical caution**: Applies a sign-agreement mask to filter gradient updates
3. **Optional aligned-only momentum update**: Can update momentum only with gradients that align with the current momentum direction

Usage pattern:
```python
optimizer.zero_grad()
loss_L.backward()
optimizer.step_labeled()
optimizer.zero_grad()
loss_U.backward()
optimizer.step_unlabeled()
```

### Split Loss Implementation for Study 5

The `LA_BCP_train_sgd_consolidated_split_loss_hco.py` file contains an equivalent implementation specifically created for Study 5 in the paper. This implementation splits the labeled and unlabeled losses to analyze their dynamics during training:

1. It separates the mixed loss into purely supervised and purely unsupervised components
2. This allows tracking the ratio of unlabeled to labeled loss (L_unlabeled / L_labeled) throughout training
3. The analysis demonstrates how HCO variants consistently increase the relative contribution of unlabeled loss, helping to prevent representation collapse

This implementation is functionally equivalent to the standard implementation but enables the detailed analysis of loss dynamics presented in Study 5 of the paper.

## Results

As demonstrated in the paper, using HCO variants of the optimizers leads to significant improvements in segmentation accuracy:

- **Left Atrium (LA) Dataset**: SGD_HCO improves Dice score by 1.3% and reduces HD95 by 27.9% compared to standard SGD.
- **Pancreas Dataset**: Adam_HCO improves Dice score by 1.0% and reduces ASD by 15.1% compared to standard Adam.

These improvements highlight HCO's effectiveness in leveraging unlabeled data for semi-supervised medical image segmentation.


## Acknowledgments

This implementation builds on:
* [BCP](https://github.com/DeepMed-Lab-ECNU/BCP)
