# SSM-SDE Time Series Forecasting

This is a time series forecasting project based on **Stochastic Differential Equations (SDEs)** using **State Space Models (SSMs)**. The project leverages the **Mamba** architecture to model and predict various stochastic processes including GBM, OU, CIR, and their hybrid variants.

## Project Structure

```
ssm_sde/
├── src/                    # Source code
│   ├── data/              # Data processing
│   ├── model/             # Model definitions
│   ├── loss/              # Loss functions
│   └── eval/              # Evaluation functions
├── example/               # Example config files
├── scripts/               # Shell scripts
├── requirements.txt       # Dependency list
├── train.py               # Training entry point
└── test.py                # Testing entry point
```

## Environment Setup

### 1. Create a virtual environment

```bash
# Using conda
conda create -n ssm_sde python=3.8
conda activate ssm_sde

# Or using venv
python -m venv ssm_sde
source ssm_sde/bin/activate  # Linux/Mac
.\ssm_sde\Scripts\activate   # Windows
```

### 2. Install dependencies

```bash
pip install -r requirements.txt
```

Key dependencies include:
- PyTorch >= 2.0.0
- NumPy
- Matplotlib
- PyYAML
- tqdm
- wandb (optional for experiment tracking)

## Usage Guide

### 1. Train the model

```bash
python train.py --config example/same_sde/1.yaml
```

### 2. Test the model

```bash
python test.py \
    --config example/same_sde/1.yaml \
    --checkpoint_dir checkpoints/your_model_checkpoint \
    --test_gpu 0 \
    --data_path data/your_test_data  # optional
```

### 3. Config File Explanation

YAML configuration files include the following key parameters:

```yaml
# Model Parameters
batch_size: 512
d_model: 32
n_layer: 2
input_dim: 1
output_dim: 1
input_len: 100
output_len: 50

# Data Parameters
data:
  num_samples: 50000
  seq_len: 150
  type: generate_hybrid_gbm_data
  mu1: -0.00273973
  sigma1: 0.004
  mu2: 0.0109589
  sigma2: 0.006
  x0_base: 40.0
  x0_perturb: 0.1
  switch_probability: 0.8
  seed: null
  save_dir: ./data

# Training Parameters
lr: 0.0001
num_epochs: 100
device: cuda:0
train_ratio: 0.8
val_ratio: 0.1

# Saving & Logging
save:
  keep_models: 3
  log_dir: ./logs
  model_name: model_name
  save_dir: ./checkpoints
```

### 4. Supported Data Types

The following types of stochastic processes are currently supported:
- `generate_hybrid_gbm_data`: Hybrid GBM process
- `generate_hybrid_ou_data`: Hybrid OU process
- `generate_hybrid_cir_data`: Hybrid CIR process
- `generate_hybrid_gbm_ou_data`: GBM-OU hybrid
- `generate_hybrid_gbm_cir_data`: GBM-CIR hybrid
- `generate_hybrid_ou_cir_data`: OU-CIR hybrid
- More hybrid types to be added...

### 5. Using Custom Data

To use your own data:
1. Save your data as `data.npy` and `timestamps.npy`
2. Specify the `data_path` in your config file
3. Ensure the following data format:
   - Data shape: `[num_samples, seq_len, feature_dim]`
   - Timestamps shape: `[num_samples, seq_len]`

## Outputs

During training and evaluation, the following outputs are generated:
1. Model checkpoints: stored in `checkpoints/`
2. Training logs: stored in `logs/`
3. Generated datasets: saved under `data/`
4. Evaluation results: saved in `eval_results/` under the checkpoint directory

## Notes

- Ensure your GPU has sufficient memory
- Start with a small dataset to validate the training pipeline
- Use the config file to adjust model settings
- During testing, make sure to specify the correct checkpoint path
