# VFEM

VFEM is the reference implementation of *VFEM: Visual Feature Empowered Multivariate Time Series Forecasting with Cross-Modal Fusion*. The framework performs long-horizon forecasting via vision and temporal cross-modal fusion.

## Quick Start

### 1. Install dependencies
```bash
pip install -r requirements.txt
```
The vision encoder uses `attn_implementation="flash_attention_2"`; install `flash-attn` if needed (optional; may fall back depending on PyTorch version).

### 2. Download datasets
- CSV files are available on [Google Drive](https://drive.google.com/drive/folders/1ZOYpTUa82_jCcxIdTmyr0LXQfvaM9vIy).
- Place downloaded files under `./dataset/<dataset_name>/` and keep the original filenames.

### 3. Run training
- Experiment scripts are under `./scripts/`, organized by dataset.
- Example (ETTh1):
  ```bash
  sh ./scripts/ETT_script/VFEM_ETTh1.sh
  ```
- Or run the entry point directly:
  ```bash
  python run.py --model VFEM_r --data ETTh1 --root_path ./dataset/ETT-small/ --data_path ETTh1.csv
  ```

## Directory structure
```
VFEM/
├── run.py
├── README.md
├── data_provider/
├── exp/
├── models/
│   ├── csformer/
│   ├── vision/
│   └── layers/
├── scripts/
├── utils/
└── dataset/
```

## Model: `models/csformer/model.py`

The `Model` class uses the vision branch (`use_venc`). The main sequence is normalized with a sliding RevIN window; the vision branch converts time series to images and uses a pretrained vision encoder plus projection; features are concatenated with the temporal branch and passed through a fusion encoder and linear head, then denormalized for prediction.

## Notes

- Training logs are written to the paths set in each script (e.g. `result_*.txt` or CSV); you can change output paths in `run.py` if needed.

## Reproducibility

Install dependencies with `pip install -r requirements.txt`, place datasets under `./dataset/`, then run e.g. `sh ./scripts/ETT_script/VFEM_ETTh1.sh`.
