# Bayesian LSTM for Time Series Forecasting

This repository implements Bayesian LSTM networks for time series forecasting, with a focus on uncertainty quantification. The implementation includes multiple Bayesian approximation methods based on Fortunato et al. "Bayesian Recurrent Neural Networks" (2017).

## Hardware Requirements

- **GPU**: NVIDIA H100 NVL (95GB VRAM)
- **CUDA Version**: 13.0
- **Driver Version**: 580.95.05

While the code can run on smaller GPUs, memory-intensive operations (particularly full-rank Bayesian models and large ensemble sizes) may require adjustments to batch size or model architecture.

## Software Requirements

```
Python = 3.10.X
TensorFlow >= 2.15
TensorFlow Probability >= 0.23
NumPy >= 1.24
scikit-learn >= 1.3
pandas >= 2.0
matplotlib >= 3.7
seaborn >= 0.12
joblib >= 1.3
```

Install dependencies:
```bash
pip install tensorflow tensorflow-probability numpy scikit-learn pandas matplotlib seaborn joblib
```

## Repository Structure

```
.
├── Bayesian_LSTM.ipynb              # Main notebook for training and evaluation
├── Multi_Seed_Robustness_Check.ipynb # Multi-seed experiments for statistical robustness
├── beijing_data/                     # Preprocessed Beijing PM2.5 dataset
│   ├── X_train_raw.npy              # Training features (N, T, F)
│   ├── X_val_raw.npy                # Validation features
│   ├── X_test_raw.npy               # Test features
│   ├── y_train_raw.npy              # Training targets
│   ├── y_val_raw.npy                # Validation targets
│   ├── y_test_raw.npy               # Test targets
│   ├── X_raw_ood.npy                # Out-of-distribution features
│   ├── y_raw_ood.npy                # Out-of-distribution targets
│   ├── scaler_X.pkl                 # Feature scaler
│   ├── scaler_y.pkl                 # Target scaler
│   └── metadata.pkl                 # Dataset metadata
├── modules/                          # Core implementation modules
│   ├── bayeslstm.py                 # Variational layer implementations
│   ├── model_builders.py            # Model architecture builders
│   ├── training.py                  # Training utilities and callbacks
│   ├── evaluation.py                # Evaluation metrics and MC sampling
│   ├── visualization.py             # Plotting utilities
│   ├── data_utils.py                # Data loading and preprocessing
│   ├── config.py                    # Configuration and hyperparameters
│   ├── empirical_bounds.py          # PAC-Bayes bound computation
│   ├── compute_empirical_bounds.py  # Bound computation scripts
│   ├── compute_interval_scores.py   # Interval score computation
│   └── plot_empirical_bounds_vs_rank.py # Bound visualization
├── checkpoints/                      # Saved model weights
├── figures/                          # Generated plots and figures
├── results_csv/                      # Evaluation results in CSV format
└── multi_seed_results/              # Multi-seed experiment results
```

## Models Implemented

| Model | Description | Parameters |
|-------|-------------|------------|
| **Deterministic LSTM** | Standard LSTM baseline | ~33K |
| **Full-Rank Bayesian** | Bayes by Backprop with full covariance | ~66K |
| **Low-Rank Bayesian** | Low-rank factorization of weight uncertainty | Configurable rank |
| **Low-Rank (SVD Init)** | Low-rank with SVD initialization from trained deterministic weights | Configurable rank |
| **Rank-1 Bayesian** | Rank-1 multiplicative weight perturbations | Minimal overhead |
| **Deep Ensemble** | Ensemble of M deterministic LSTMs | M x ~33K |

## Quick Start

### 1. Load Data

```python
from modules.data_utils import load_and_preprocess_data

(X_train, X_val, X_test,
 y_train, y_val, y_test,
 scaler_X, scaler_y, meta) = load_and_preprocess_data(data_dir="beijing_data")
```

### 2. Configure Environment

```python
from modules.config import configure_environment, SEED

configure_environment()
```

### 3. Build a Bayesian LSTM

```python
from modules.model_builders import build_bayesian_lstm_lowrank

model, kl_loss_fn, variational_layers = build_bayesian_lstm_lowrank(
    input_size=15,          # Number of features
    sequence_length=24,     # Sequence length
    lstm_hidden_size=64,    # Hidden state dimension
    num_lstm_layers=2,      # Number of stacked LSTM layers
    ranks=[14, 20],         # Rank per layer
    output_dim=1,
    output_mode="last"
)
```

### 4. Train the Model

```python
from modules.training import train_bayesian_lstm

history = train_bayesian_lstm(
    model=model,
    variational_layers=variational_layers,
    X_train=X_train, y_train=y_train,
    X_val=X_val, y_val=y_val,
    batch_size=64,
    epochs=150,
    learning_rate=1e-3,
    kl_scale=0.25,
    seed=42
)
```

### 5. Evaluate with Uncertainty Quantification

```python
from modules.evaluation import evaluate_bayesian_model

results = evaluate_bayesian_model(
    model=model,
    variational_layers=variational_layers,
    X_test=X_test,
    y_test=y_test,
    scaler_y=scaler_y,
    num_mc_samples=250,
    confidence_level=0.95
)

print(f"RMSE: {results['rmse']:.4f}")
print(f"Coverage: {results['coverage']:.2%}")
```

## Configuration

All hyperparameters are centralized in `modules/config.py`:

```python
# Model Architecture
LSTM_HIDDEN = 64
NUM_LAYERS = 2
INPUT_SIZE = 15
SEQUENCE_LENGTH = 24

# Training
BATCH_SIZE = 64
EPOCHS = 150
LEARNING_RATE = 1e-3
KL_SCALE = 0.25

# Low-Rank Configuration
RANK = [14, 20]  # Rank per LSTM layer

# Evaluation
NUM_MC_SAMPLES = 250
CONFIDENCE_LEVEL = 0.95
```

## Running Experiments

### Main Experiments
Open and run `Bayesian_LSTM.ipynb` to:
1. Train all model variants
2. Evaluate predictive performance (RMSE, MAE)
3. Assess uncertainty calibration (coverage, interval width)
4. Compute PAC-Bayes generalization bounds
5. Visualize results

### Multi-Seed Robustness
Run `Multi_Seed_Robustness_Check.ipynb` to:
1. Train models across multiple random seeds (42, 123, 456, 2026)
2. Compute mean and standard deviation of metrics
3. Assess statistical significance of results

## Key Implementation Details

### Weight Caching for Bayesian RNNs
Per the Fortunato et al. paper, weights are sampled once per batch and reused across all timesteps:

```python
for t in range(sequence_length):
    use_cached = (t > 0)  # Sample at t=0, reuse for t>0
    z = x_to_gates(x_t, training=True, use_cached=use_cached)
```

### KL Divergence Scaling
KL divergence is scaled by `1/N` where N is the number of training samples:

```python
kl_scale = 0.1 / num_train_samples
```

### Cache Clearing Callback
Caches are cleared between batches during training:

```python
class CacheClearingCallback(tf.keras.callbacks.Callback):
    def on_train_batch_begin(self, batch, logs=None):
        clear_model_cache(self.variational_layers)
```

## Output Files

After running experiments:
- **checkpoints/**: Saved model weights (`.weights.h5`)
- **figures/**: Visualization plots (prediction intervals, calibration curves, etc.)
- **results_csv/**: Metric tables in CSV format

