# CASMIR: Coupled Adaptive Feature–Target Smoothing with Density-Gated Mixture-of-Experts for Robust Imbalanced Tabular Regression

Official implementation of **CASMIR (Coupled Adaptive Feature–Target Smoothing with Density-Gated Mixture-of-Experts for Robust Imbalanced Tabular Regression)** for ICLR 2026.



**CASMIR** is a framework for imbalanced tabular regression that adaptively smooths sparse data representations and weights specialized experts to achieve state-of-the-art balanced performance across all densities.



## Table of Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [Project Structure](#project-structure)
- [Training](#training)
- [Inference](#inference)
- [Experiments](#experiments)
- [Datasets](#datasets)
- [Algorithms](#algorithms)
- [Citation](#citation)



## Installation

### Requirements

- Python 3.8+
- CUDA 11.0+ (optional, for GPU acceleration)

### Install Dependencies

```bash
pip install -r requirements.txt
```



## Quick Start

### 1. Train a Model

Train CASMIR on the Abalone dataset:

```bash
python train.py --datasets Abalone --algorithms CASMIR --trials 20 --save_models
```

### 2. Run Inference

Use a trained model to make predictions:

```bash
python predict.py --dataset Abalone --algorithm CASMIR --artifacts_dir learned_models
```

### 3. Quick Test (Few Epochs)

For quick testing, limit the training epochs:

```bash
# Train with 5 HPO trials, 50 HPO epochs, 100 final epochs
python train.py --datasets Abalone --algorithms CASMIR --trials 5 --hpo_epochs 50 --final_epochs 100 --save_models

# Predict
python predict.py --dataset Abalone --algorithm CASMIR --artifacts_dir learned_models
```



## Project Structure

```
Published_Code/
├── train.py                    # Main training script
├── predict.py                  # Inference script
├── config.py                   # Configuration file
├── requirements.txt            # Python dependencies
├── README.md                   # This file
│
├── src/                        # Source code
│   ├── models/                 # Model implementations
│   │   ├── CASMIR_V1.py        # CASMIR model (our)
│   │   ├── basic_models.py     # Baseline models (MLP, XGBoost, etc.)
│   │   └── samplers.py         # Data augmentation samplers (SMOTER, Gaussian Noise)
│   │
│   ├── data/                   # Data loading and preprocessing
│   │   └── datasets.py         # Dataset utilities
│   │
│   ├── training/               # Training utilities
│   │   ├── train_utils.py      # Training loops
│   │   ├── losses.py           # Loss functions
│   │   └── hpo.py              # Hyperparameter optimization
│   │
│   ├── evaluation/             # Evaluation metrics
│   │   └── evaluation.py       # Performance evaluation
│   │
│   └── utils/                  # Utility functions
│       └── utils.py            # Helper functions
│
├── regseg/                     # Resampling methods 
|   └── resampler.py            # Resampling Algorithms for regression
│
├── data/                       # Dataset files
│
└── scripts/                    # Analysis scripts
    ├── ablation_study.py       # Ablation study experiments
    └── visualize_tsne.py       # t-SNE visualization
```



## Training

### Basic Training

Train a model with hyperparameter optimization:

```bash
python train.py --datasets <DATASET> --algorithms <ALGORITHM> --trials <N_TRIALS> --save_models
```

**Example:**

```bash
# Train CASMIR on Abalone with 30 HPO trials
python train.py --datasets Abalone --algorithms CASMIR --trials 30 --save_models

# Train multiple algorithms
python train.py --datasets Abalone --algorithms MLP XGBoost CASMIR --trials 20 --save_models
```



### Using Pre-optimized HPO Results

Load optimal hyperparameters from `optimized_hpo_results/` and train directly:

```bash
# Skip HPO and use saved hyperparameters
python train.py --datasets Abalone --algorithms CASMIR --load_hpo optimized_hpo_results --skip_hpo --save_models

# Load HPO as starting point and continue optimization
python train.py --datasets Abalone --algorithms CASMIR --load_hpo optimized_hpo_results --trials 10 --save_models
```



### Training Options

| Option | Description | Default |
|--------|-------------|---------|
| `--datasets` | Dataset name(s) to train on | Required |
| `--algorithms` | Algorithm(s) to use | Required |
| `--trials` | Number of HPO trials | 50 |
| `--hpo_epochs` | Epochs during HPO search (for faster search) | 600 |
| `--final_epochs` | Epochs for final training | 600 |
| `--save_models` | Save trained models to learned_models/ | False |
| `--load_hpo` | Path to pre-optimized HPO results | None |
| `--skip_hpo` | Skip HPO (use loaded params directly) | False |
| `--gpu` | GPU device ID | Auto |
| `--cpu` | Force CPU training | False |
| `--seed` | Random seed | 42 |
| `--output_dir` | Model output directory | None |
| `--n_jobs` | Optuna parallel trials | 1 |



## Inference

### Using Trained Models

```bash
# From learned_models/ directory
python predict.py --dataset Abalone --algorithm CASMIR --artifacts_dir learned_models

# From artifacts/ directory
python predict.py --dataset Abalone --algorithm CASMIR --artifacts_dir artifacts

# Save predictions to CSV
python predict.py --dataset Abalone --algorithm CASMIR --artifacts_dir learned_models --output predictions.csv
```

### Inference Output

The script outputs:
- **Original Test Set**: MAE, RMSE, Shot-wise MAE (Few/Medium/Many)
- **Balanced Test Set**: MAE, RMSE, Shot-wise MAE (Few/Medium/Many)
- **Verification**: Compares calculated metrics with saved metrics

```
[METRICS] Original Test Set (n=836)
  MAE: 1.4757
  RMSE: 2.1029
  Shot-wise MAE:
  - Few (n=177): 2.6955
  - Medium (n=183): 1.2509
  - Many (n=476): 1.1084

[METRICS] Balanced Test Set (n=177)
  MAE: 1.6464
  ...

[VERIFY] Comparing with saved performance...
  [PASS] ORI verification passed (diff: 0.000000)
  [PASS] BAL verification passed (diff: 0.000000)
```



## Experiments

### Ablation Study

Analyze the contribution of each CASMIR component:

```bash
# Full ablation study
python scripts/ablation_study.py --artifacts_folder artifacts --dataset Abalone

# Quick test with limited epochs
python scripts/ablation_study.py --artifacts_folder artifacts --dataset Abalone --epochs 10 --patience 10
```



**Compared Models (8 variants):**

| Model | Description |
|-------|-------------|
| MoE Only | MoE structure without CAS module |
| MixUp + MoE | MixUp augmentation with MoE |
| CAS Only (Coupled) | CAS without MoE (direct prediction) |
| CAS Feature-Only | CAS using only feature similarity |
| CAS No Learnable Metric | CAS with fixed distance metric (w=1) |
| CAS Fixed Strength (Smooth Strength 0) | No smoothing applied |
| CAS Fixed Strength (Smooth Strength 1) | Maximum smoothing applied |
| CASMIR (Full)                          | The final model integrating all proposed components |



### t-SNE Visualization

Visualize learned representations:

```bash
# Auto-scan for CASMIR artifacts
python scripts/visualize_tsne.py --auto_scan --dataset Abalone --artifacts_folder artifacts

# Specify artifact directory
python scripts/visualize_tsne.py --artifact_dir artifacts/V001_Abalone_CASMIR_2025_11_29__12_00 --dataset Abalone

# Quick test with limited epochs
python scripts/visualize_tsne.py --auto_scan --dataset Abalone  --artifacts_folder artifacts --epochs 10 --patience 10
```



## Datasets

We evaluate on 40 regression datasets:

| # | Dataset | # | Dataset |
|---|---------|---|---------|
| 1 | Abalone | 21 | grid_stability_regression |
| 2 | airfoild | 22 | kin8nm |
| 3 | availPwr | 23 | kings_county |
| 4 | bank32nh | 24 | machineCpu |
| 5 | bank8FM | 25 | magic_irri |
| 6 | bike_sharing | 26 | maxTorque |
| 7 | combined_cycle_power_plant | 27 | miami_housing_regression |
| 8 | communities_crime | 28 | Moneyball |
| 9 | concreteStrength | 29 | nhanes_age |
| 10 | cps88wages | 30 | online_news_popularity |
| 11 | cpuSm | 31 | parkinsons_telemonitoring |
| 12 | dAiler | 32 | pumadyn32nh |
| 13 | diabetes | 33 | qsar_aquatic_toxicity |
| 14 | diamond_regression | 34 | red_wine |
| 15 | ecoli70 | 35 | servo |
| 16 | energy_efficiency | 36 | socmob |
| 17 | forest_fires | 37 | solar_flare |
| 18 | fps_benchmark | 38 | space_ga |
| 19 | fuelCons | 39 | superconductivity |
| 20 | geographical_origin_of_music | 40 | white_wine |

Datasets are automatically downloaded from OpenML when needed. Local CSV files in `data/` are used as fallback.



## Algorithms

We compare 14 algorithms:

| Category | Algorithm | Description |
|----------|-----------|-------------|
| **Ours** | `CASMIR` | Coupled Adaptive Feature–Target Smoothing with Density-Gated Mixture-of-Experts for Robust Imbalanced Tabular Regression |
| **MLP Variants** | `MLP` | Standard MLP baseline |
| | `MLP_SQRT_INV` | MLP with SQRT-INV reweighting |
| | `MLP_ConR` | MLP with ConR loss |
| | `MLP_LDS_Notebook` | MLP with Label Distribution Smoothing |
| | `MLP_GAI_BMSE` | MLP with Balanced MSE (GAI loss) |
| | `MLP_BMC_BMSE` | MLP with Balanced MSE (BMC loss) |
| | `MLP_RankSim` | MLP with RankSim loss |
| **Ensemble** | `Simple_Ensemble` | Simple 3-MLP average ensemble |
| **Tree-based** | `XGBoost` | XGBoost baseline |
| | `LightGBM` | LightGBM baseline |
| | `CatBoost` | CatBoost baseline |
|  | `SMOTER_XGBoost` | XGBoost with SMOTER augmentation |
| | `GaussianNoise_XGBoost` | XGBoost with Gaussian noise augmentation |





## Performance Metrics

We report the following metrics:

- **Overall MAE**: Mean Absolute Error on entire test set
- **Shot-wise MAE**: MAE for Few-shot, Medium-shot, Many-shot samples



## Citation

If you use this code in your research, please cite:

```bibtex
@inproceedings{casmir2026,
  title={CASMIR: Coupled Adaptive Feature–Target Smoothing with Density-Gated Mixture-of-Experts for Robust Imbalanced Tabular Regression},
  author={Anonymous},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2026}
}
```



## License

This project is released under the MIT License.



---

**Note**: This code is submitted for ICLR 2026 review. 
