# PowerSig JAX Examples: Regression and Classification

This directory contains examples demonstrating how to use PowerSig JAX for both regression and classification tasks on time series data using aeon datasets.

## Overview

The examples show how to:

1. Load aeon time series datasets (regression or classification)
2. Preprocess the data (normalization and time augmentation)
3. Compute the gram matrix using PowerSig JAX
4. Perform kernel ridge regression or SVM classification using scikit-learn
5. Evaluate the model performance

## Files

### Regression Examples
- `regression.py`: Full-featured regression example with comprehensive error handling and detailed output
- `regression_simple.py`: Simplified regression example for easier testing

### Classification Examples
- `classification.py`: Classification example using SVM with precomputed kernels

## Requirements

Make sure you have the following packages installed:
```bash
pip install powersig aeon scikit-learn jax[cuda12] numpy
```

## Usage

### Regression Examples
Run the simple regression version:
```bash
python examples/regression_simple.py
```

Or run the full regression version:
```bash
python examples/regression.py
```

### Classification Example
Run the classification example:
```bash
python examples/classification.py
```

## Dataset Requirements

### Regression Examples
- **Time series length**: The examples check that each time series has at least 1000 timesteps
- **Dataset type**: Uses `load_regression()` from aeon (though currently using classification datasets as regression targets due to limited regression datasets)
- **Current dataset**: Uses "StandWalkJump" which has 2500 timesteps (exceeds the 1000 requirement)

### Classification Examples
- **Dataset type**: Uses `load_classification()` from aeon
- **Number of samples**: Works with any number of samples, but more samples generally give better results

## Available AEON Datasets

### Datasets with Long Time Series (>= 1000 timesteps):
**For Regression Examples:**
- **StandWalkJump**: ~1000 samples, 2500 timesteps, 4 features (current default)
- **SelfRegulationSCP2**: ~1000 samples, 1152 timesteps, 7 features
- **MotorImagery**: ~1000 samples, 1000 timesteps, 64 features
- **Cricket**: ~1080 samples, 1197 timesteps, 6 features

### Classification Datasets (>1000 samples):
- **EigenWorms**: ~1300 samples, 84 timesteps, 6 features
- **BasicMotions**: ~1000 samples, 100 timesteps, 6 features
- **ArticularyWordRecognition**: ~1000 samples, 144 timesteps, 9 features
- **CharacterTrajectories**: ~1000 samples, 182 timesteps, 3 features
- **Cricket**: ~1080 samples, 1197 timesteps, 6 features
- **DuckDuckGeese**: ~1000 samples, 134 timesteps, 1345 features
- **ERing**: ~1000 samples, 65 timesteps, 4 features
- **FaceDetection**: ~1000 samples, 62 timesteps, 144 features
- **FingerMovements**: ~1000 samples, 50 timesteps, 28 features
- **HandMovementDirection**: ~1000 samples, 400 timesteps, 10 features
- **Heartbeat**: ~1000 samples, 405 timesteps, 61 features
- **InsectWingbeat**: ~1000 samples, 256 timesteps, 200 features
- **JapaneseVowels**: ~1000 samples, 29 timesteps, 12 features
- **Libras**: ~1000 samples, 45 timesteps, 2 features
- **LSST**: ~1000 samples, 36 timesteps, 6 features
- **MotorImagery**: ~1000 samples, 1000 timesteps, 64 features
- **NATOPS**: ~1000 samples, 51 timesteps, 24 features
- **PEMS-SF**: ~1000 samples, 144 timesteps, 963 features
- **PenDigits**: ~1000 samples, 8 timesteps, 2 features
- **Phoneme**: ~1000 samples, 217 timesteps, 11 features
- **RacketSports**: ~1000 samples, 30 timesteps, 6 features
- **SelfRegulationSCP1**: ~1000 samples, 896 timesteps, 6 features
- **SelfRegulationSCP2**: ~1000 samples, 1152 timesteps, 7 features
- **StandWalkJump**: ~1000 samples, 2500 timesteps, 4 features
- **UWaveGestureLibrary**: ~1000 samples, 315 timesteps, 3 features

### Very Large Datasets (>5000 samples):
- **ElectricDevices**: ~16637 samples, 96 timesteps, 1 feature
- **FordA**: ~4921 samples, 500 timesteps, 1 feature
- **FordB**: ~4446 samples, 500 timesteps, 1 feature
- **PhalangesOutlinesCorrect**: ~2658 samples, 80 timesteps, 1 feature
- **ShapesAll**: ~1200 samples, 512 timesteps, 2 features

## How to Change the Dataset

### For Regression Examples:
```python
# In the main function, change this line:
X_train, y_train = load_aeon_regression_dataset_simple("StandWalkJump", "train")
X_test, y_test = load_aeon_regression_dataset_simple("StandWalkJump", "test")

# To use a different dataset with >= 1000 timesteps:
# - "StandWalkJump" (2500 timesteps) - current default
# - "SelfRegulationSCP2" (1152 timesteps)
# - "MotorImagery" (1000 timesteps)
# - "Cricket" (1197 timesteps)
```

### For Classification Examples:
```python
# In the main function, change this line:
X_train, y_train, label_map = load_aeon_classification_dataset("EigenWorms", "train")
X_test, y_test, _ = load_aeon_classification_dataset("EigenWorms", "test")

# To use a different dataset, e.g., "BasicMotions":
X_train, y_train, label_map = load_aeon_classification_dataset("BasicMotions", "train")
X_test, y_test, _ = load_aeon_classification_dataset("BasicMotions", "test")
```

## Expected Output

### Regression Examples:
- Dataset information and preprocessing details
- Gram matrix computation progress and statistics
- Kernel ridge regression training with different alpha values
- Final test results (MSE, MAE, R² score)
- Sample predictions
- Model summary

### Classification Examples:
- Dataset information and class distribution
- Gram matrix computation progress and statistics
- SVM training with different C values
- Final test results (accuracy, classification report, confusion matrix)
- Sample predictions with class labels
- Model summary

## Key Features

1. **Time Augmentation**: Adds a time feature to each time series to improve signature kernel performance
2. **Data Normalization**: Scales the data to [0,1] range
3. **Hyperparameter Tuning**: Automatically finds the best regularization parameters
4. **Comprehensive Evaluation**: Reports appropriate metrics for each task
5. **Progress Tracking**: Shows computation times and progress indicators

## Troubleshooting

If you encounter issues:

1. **Import Errors**: Make sure all required packages are installed
2. **Memory Issues**: Try using a smaller dataset or reducing the polynomial order
3. **CUDA Issues**: The code will fall back to CPU if CUDA is not available
4. **Dataset Not Found**: Check the dataset name spelling and ensure aeon is properly installed
5. **Time Series Length**: For regression, ensure your dataset has time series with >= 1000 timesteps

## Performance Notes

- The gram matrix computation is the most time-consuming part
- Larger datasets and higher polynomial orders increase computation time
- GPU acceleration significantly improves performance for large datasets
- The examples use polynomial order 8 for faster computation; you can increase this for potentially better accuracy

## Important Notes

- **Regression Examples**: Currently use classification datasets as regression targets since aeon has limited regression datasets. In practice, you would use actual regression datasets.
- **Time Series Length**: The regression examples check that each time series has at least 1000 timesteps, not the number of individual time series.
- **Classification Examples**: Work with any number of samples and any time series length. 