# Signature‑Informed Transformer (SIT) for Asset Allocation

This is the origin Pytorch implementation of SIT in the following paper: SIGNATURE-INFORMED TRANSFORMER FOR ASSET ALLOCATION.

🚩**News**(AUG 08, 2025)  We have released SIT.

## Repository structure

* `asset_data/full_dataset.csv` – a CSV of daily prices/returns used to reproduce the paper’s experiments.  It contains date‑indexed closing prices for a universe of up to 50 assets.  The data are split chronologically: **training** covers 2000‑01‑01 to 2016‑12‑31, **validation** covers 2017‑01‑01 to 2019‑12‑31 and **testing** spans 2020‑01‑01 to 2024‑12‑31.  Only the first `data_pool` columns (assets) are used during training.

* `0_get_sig_data_all.py` – pre‑computes signature and cross‑signature features.  It reads `full_dataset.csv`, splits it into the train/val/test ranges above and saves the signature tensors and future returns for multiple asset pools and window/horizon configurations.  Running this script is optional but speeds up training.

* `run.py` – entry point for training and evaluation.  It wraps the experiment class in `exp/` and exposes many hyper‑parameters, such as number of assets (`--data_pool`), lookback window (`--window_size`), horizon (`--horizon`), model dimension (`--d_model`), number of transformer layers and heads, maximum position, trade cost etc.

* `runfile/test.sh` – example shell script that trains SIT on three different asset pools (30, 40 and 50 assets) with different hyper‑parameter settings.  Adjust the script or construct your own command lines using `run.py`.

* `results/` – contains equity curves (`*_test_equity_curve.png`), portfolio statistics (`*_test_metrics.csv`) and positions (`*_test_positions.csv`) generated by the example script.

## Requirements and installation

SIT requires **Python 3.8+** and **PyTorch 1.10+**.  To install the dependencies, clone the repository and run:

```bash
# clone the project (replace with your fork if necessary)
git clone https://github.com/Yoontae6719/Signature-Informed-Transformer-For-Asset-Allocation.git
cd Signature-Informed-Transformer-For-Asset-Allocation

# install python packages
pip install -r requirements.txt  # installs PyTorch, pandas, numpy, tqdm, joblib, etc
```

1. **Obtain the dataset.**  A sample `full_dataset.csv` is provided under `asset_data/`.  If you wish to experiment with your own assets, create a CSV with a `Date` column and one column per asset containing daily returns or prices.  Missing values should be forward‑filled.

2. **Generate signatures (recommended).**  Running signature extraction ahead of time speeds up training.  Use:

```bash
   # create signature caches for pools of 30, 40 and 50 assets with window=60 and horizon=20
   python 0_get_sig_data_all.py
```

   The script iterates over `DATA_POOLS = [40, 50, 30]` and saves pre‑computed training, validation and test tensors to `signature_cache_6020/pool_{n}`.  If you change the `--window_size` and `--horizon` values in `run.py`, re‑generate the cache accordingly.

## Training and evaluation

To train SIT from scratch and evaluate it on the test set, execute:

```bash
python run.py \
    --is_training 1 \
    --model_id dp30 \
    --model SIT \
    --data FULL \
    --root_path ./asset_data/ \
    --data_path full_dataset.csv \
    --data_pool 30 \
    --window_size 60 \
    --horizon 20 \
    --d_model 8 \
    --n_heads 8 \
    --num_layers 1 \
    --sig_input_dim 2 \
    --cross_sig_dim 1 \
    --hidden_c 64 \
    --ff_dim 64 \
    --temperature 1.3 \
    --trade_cost_bps 0.0 \
    --itr 3
```

Alternatively, run the provided script:

```bash
bash ./runfile/test.sh
```

which trains three configurations sequentially.  Training results and test performance are saved under `results/`.

### Important command‑line flags

| Flag               | Description                                                                                                                              |
| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `--data_pool`      | Number of assets to include in the portfolio (e.g., 30, 40, 50).                                                                         |
| `--window_size`    | Length of the historical window used to compute path signatures.  The script `0_get_sig_data_all.py` uses a default of 60.               |
| `--horizon`        | Prediction horizon (in trading days).  Default is 20.                                                                                    |
| `--temperature`    | Softmax temperature used when converting predicted returns into portfolio weights; higher temperature produces more uniform allocations. |
| `--trade_cost_bps` | Transaction cost in basis points (e.g., 0.05 % = 0.5 bps).                                                                               |

## Results and metrics

After training, SIT evaluates the portfolio on the validation and test sets.  The experiment class computes the conditional value‑at‑risk (CVaR) and other metrics and saves:

* **Equity curves** – `.png` plots showing cumulative returns on the test set.
* **Metrics CSV** – summary statistics such as annualised return, volatility, Sharpe ratio and CVaR.
* **Positions CSV** – the predicted positions for each rebalancing date.

Results generated by `test.sh` can be found under `results/`.

## Citation
will be updated

## License

This project is open‑sourced under the **MIT License**.  See `LICENSE` for details.
