# Anonymous NeurIPS Submission Code

This repository provides an end-to-end pipeline for training and evaluating multivariate delay embedding (MDE) models.

## File Structure

```
.
├── code/
│   ├── core.py          # Core MDE_CV class implementation
│   ├── utils.py         # Utils: convergence, metrics, parameter estimation
│   ├── train.py         # Auto-param train entrypoint
│   └── eval.py          # Evaluation entrypoint
├── main.py              # Full pipeline: train, predict, metrics, save
├── README.md            # Project documentation
└── requirements.txt     # Dependencies
```

## Installation

```bash
pip install -r requirements.txt
```

## Usage

### 1.Training

```bash
python code/train.py \
  --input_csv data/timeseries.csv \
  --target Y \
  --output_model models/mde_model.pkl
```

Internally, `train.py` calls `estimate_best_params()` to select optimal `E` and `tau`.

### 2. Evaluation

```bash
python code/eval.py \
  --model_pickle models/mde_model.pkl \
  --ground_truth_csv data/timeseries.csv \
  --target Y
```

Prints RMSE, MAE, and Correlation based on the predictions DataFrame.

### 3. Full Pipeline

```bash
python main.py \
  --input_csv data/timeseries.csv \
  --target Y
```

Runs training, prediction, computes metrics, and saves results.

---

## MDE_CV Class (in `code/core.py`)

The `MDE_CV` class implements a cross-validated multivariate delay embedding (MDE) model.

### Initialization Parameters

- `Tp` (int): Prediction horizon.
- `maxD` (int): Maximum embedding dimension to consider.
- `folds` (int): Number of cross-validation folds.
- `test_size` (float): Proportion of data for testing.
- `plot` (bool): Whether to plot intermediate results.
- `optimize_for` (str): Objective metric, either "correlation" or "cae".
- `conv` (bool): If False, convergence check is not performed.
- `include_target` (bool): Include target time series in embedding.
- `smap` (bool): Use SMap method if True, else Simplex.
- `final_feature_mode` (str): How to select the final features. Options:
  - "frequency": maxD Features selected most frequently across folds
  - "best_fold": Features from the single best-performing fold
  - "best_N": N Features selected most frequently across folds. **The value of N is estimated dynamically by the `predict_incremental` method.**

### Methods

- `fit(df: DataFrame, target: str, E: int, tau: int)`  
  Trains the MDE model on the DataFrame `df` using embedding dimension `E` and delay `tau`.

- `predict() -> DataFrame`  
  Generates predictions on the test set. Returns a DataFrame with columns:
  - `Observations`: True values.
  - `Predictions`: Model predictions.

- `predict_incremental(new_data: DataFrame) -> DataFrame`  
  Produces incremental/online predictions for streaming data. Also used to determine the optimal number of features when `final_feature_mode` is set to "best_N".

- `save_results(path: str)`  
  Saves model results (predictions, metrics, selected features) to the given `path`.

---
