# KR-IQN: Risk-Sensitive Multi-Objective Reinforcement Learning

A risk-sensitive multi-objective reinforcement learning (MORL) framework using Knothe-Rosenblatt (KR) quantile regression.

## Installation

```bash
docker build -t kr-iqn .
docker run --gpus all -it -v $(pwd):/workspace kr-iqn
```

## Usage

### MO-Gymnasium Environments

```bash
# KR-IQN with CVaR
python train.py --env_name mo-hopper-v4 kriqn --learning_steps 500000 --risk cvar --risk_param 0.5

# KR-IQN with Wang risk measure
python train.py --env_name mo-walker2d-v4 kriqn --learning_steps 500000 --risk wang --risk_param 0.5

# KR-IQN risk-neutral
python train.py --env_name mo-ant-v4 kriqn --learning_steps 500000 --risk neutral

# Marginal-IQN baseline
python train.py --env_name mo-hopper-v4 marginal --learning_steps 500000 --risk cvar --risk_param 0.5

# EWP baseline (MMD-based)
python train.py --env_name mo-hopper-v4 ewp --learning_steps 500000
```

### Financial Environment

Requires `train_df.csv` and `valid_df.csv` before training.

```bash
#!/bin/bash
seed=42

# KR-IQN with CVaR
python train_finance.py kirqn \
    --learning_steps=75000 \
    --batch_size=256 \
    --num_env=1 \
    --risk='cvar' \
    --risk_param=0.5 \
    --truncation_lower=1 \
    --truncation_upper=2 \
    --gamma=0.99 \
    --seed=$seed \
    --test_interval=10000 \
    --actor_lr=3e-4 \
    --ent_coef=5e-2

# Marginal-IQN baseline
python train_finance.py marginal \
    --learning_steps=75000 \
    --batch_size=256 \
    --num_env=1 \
    --risk='neutral' \
    --truncation_lower=1 \
    --truncation_upper=2 \
    --gamma=0.99 \
    --seed=$seed \
    --test_interval=10000 \
    --actor_lr=3e-4 \
    --ent_coef=5e-2

# EWP baseline
python train_finance.py ewp \
    --learning_steps=75000 \
    --batch_size=256 \
    --num_env=1 \
    --truncation_lower=1 \
    --truncation_upper=2 \
    --gamma=0.99 \
    --seed=$seed \
    --test_interval=10000 \
    --actor_lr=3e-4 \
    --ent_coef=5e-2
```

## StockTradingMOEnv

### Asset Classes
- **Commodities**: XLE, GLD, SLV
- **Bonds**: TLT, TIP, JNK  
- **Equities**: SPY, QQQ, SOXX
- **Cash**

### Rewards (reward_dim=3)
1. **Downside volatility penalty**: `clip(asset_change, -inf, 0) * reward_scaling`
2. **Asset distribution entropy**: Uniformity across 4 asset classes
3. **Log returns**: `log(after_value / prev_value) * 100`

## Risk Measures

| Type | Description                                            |
|------|--------------------------------------------------------|
| `cvar` | Conditional Value at Risk (lower α = more risk-averse) |
| `wang` | Wang transform (normal distribution-based distortion)  |
| `triangle` | Simplex risk measure                                   |
| `neutral` | Risk-neutral (no distortion)                           |

 