# Diversity-Aware Recursive Feature Multiple Kernel Learning

## Requirements

```bash
pip install numpy scipy torch scikit-learn MKLpy gurobipy libsvmdata tqdm torchmetrics
```

## Project Structure

```
RFM-DASMKL-submission/
├── DASMKL.py              # Main algorithm implementation
├── rfm/                   # RFM library (from recursive_feature_machines)
│   ├── __init__.py
│   ├── kernels.py         # Kernel functions (gaussian_M, laplacian_M)
│   ├── recursive_feature_machine.py  # GaussRFM, LaplaceRFM classes
│   ├── eigenpro.py        # EigenPro solver
│   ├── generic_kernels.py # Generic kernel implementations
│   ├── svd.py             # SVD utilities
│   └── utils.py           # Helper functions
└── README.md
```

## Usage

### Quick Demo (Recommended)

Run the demo with `breast-cancer_scale` dataset to reproduce paper results:

```bash
python DASMKL.py
```

**Expected output:**
- Dataset: `breast-cancer_scale` (569 samples, 30 features)
- Accuracy: **~97%** (Paper: 97.15±0.48)
- Runtime: ~10 seconds

The dataset is automatically downloaded via `libsvmdata`.

### Run Full Experiments

To run experiments on all datasets, modify the `datasets1` and `param_grid` in `DASMKL.py`:

```python
# Full experiment configuration
datasets1 = ["sonar", "heart_scale", "diabetes", "german.numer", 
             "breast-cancer_scale", "ionosphere", "splice"]
datasets2 = ["a8a", "w7a"]

param_grid = {
    'C': [10, 100],
    'M': [100, 200],
    'm': [15, 25],
    'la_scale': [0.05, 0.1, 0.25],
    's': [100, 300],
}
```

### Command Line Options

```bash
python DASMKL.py          # Run demo (breast-cancer_scale)
python DASMKL.py small    # Run small datasets only
python DASMKL.py large    # Run large datasets only
```

## Datasets

The code uses datasets from `libsvmdata`:
- **Small**: sonar, heart, diabetes, german, breast-cancer, ionosphere, splice
- **Large**: a8a, w7a

## Key Parameters

| Parameter | Default | Description                        |
| --------- | ------- | ---------------------------------- |
| `M`       | 100-200 | Number of candidate kernels        |
| `m`       | 15-25   | Number of selected kernels         |
| `la`      | 5-50    | Diversity-quality trade-off        |
| `C`       | 10-100  | SVM regularization                 |
| `s`       | 100-300 | Sampling size for kernel selection |

## Output

- `results_rfm_YYYYMMDD_HHMMSS.txt`: Experiment results
- `log-YYYYMMDD-HHMMSS.txt`: Detailed logs


