# **Evaluation suite for Car4Cast** 

Car4Cast supports forecasts in both a single-guess ("single-world") and multi-guess ("multi-world") format. In the latter, models are allowed to produce K alternative forecasts for the whole scene (K "worlds"), and only the best world (defined below) will be used to compute evaluation metrics.

## Definition of ADD & "best world"

The most fundamental metric defined for this task is the **Average Distance of Model Points (ADD)**, defined as the average distance between corresponding 3D bounding-box corners transformed by the predicted pose and the ground truth pose. It accounts for errors in both the predicted translation and the predicted rotation angles of bounding boxes.

Given:
- A sequence of `T` frames,
- A set of 8 corner points `X = {x₁, x₂, ..., x₈}` representing the 3D bounding box,
- Ground truth rotation `R_gt` and translation `t_gt`,
- Predicted rotation `R_pred` and translation `t_pred`,

the per-frame ADD is computed as:

$$
\text{ADD}^t = \frac{1}{8} \sum_{i=1}^8 \left\| \mathbf{R}_\text{gt}^t \mathbf{x}_i + \mathbf{t}_\text{gt}^t - \left( \mathbf{R}_\text{pred}^t \mathbf{x}_i + \mathbf{t}_\text{pred}^t \right) \right\|
$$

The average ADD over the trajectory is:

$$
\text{ADD} = \frac{1}{T} \sum_{t=1}^T \text{ADD}^t
$$


The **"best world"** is defined as the guess with minimum ADD, i.e., 
$$
k^* = \arg\min_{k \in \{1, \dots, K\}} \text{ADD}_k
$$

## **Evaluation metrics**
## Global best-world metrics
In addition to the ADD, the following metrics are computed in the "best world" and refer to the full scene.

### Miss Rate (MR@p)
The proportion of agents whose ADD exceeds `p%` of the total length of their trajectory (with a minimum threshold of 1 meter). Currently, `p = 10`.

---

### Instance Precision & Recall
Let:

- $F$ be the set of ground-truth future instances,
- $\hat{F}$ be the set of predicted future instances.

**Instance Precision (P_instance)** measures the fraction of predicted instances that are also present in the ground truth future:

$$
{P_\text{Instance}} = \frac{| \hat{F} \cap F |}{|\hat{F}|}
$$

Low precision values indicate many hallucinated (false positive) instances by the model.

**Instance Recall** measures the fraction of ground truth instances that are included in the model's prediction:

$$
{R_\text{Instance}} = \frac{| \hat{F} \cap F |}{|F|}
$$

Low recall values indicate many missed (false negative) instances by the model.

---

### Formatting Accuracy (ACC_format)

Let:

- $\hat{F}$ be the set of all predicted future instances,
- $\hat{F}_\text{format} \subseteq \hat{F}$ be the subset of predicted instances that are correctly formatted.

An instance is considered **correctly formatted** if its corresponding entry in the predicted forecast JSON file can be read without requiring any post-processing other than simple syntax fixes. For instances that are unreadable or malformed, missing entries are filled via linear interpolation when possible. If interpolation is not feasible, the instance is assumed to remain **static** at its last known historical position.

The formatting accuracy is then defined as:

$$
\text{ACC}_{format} = \frac{|\hat{F}_\text{format}|}{|\hat{F}|}
$$

This metric reflects the robustness and correctness of the model's output structure. Low values indicate frequent formatting issues or structural errors in the prediction output.

---

### Collision Rate (CR)
The proportion of agents having at least one collision with another agent at any point in the trajectory

---

## Agent-averaged best-world metrics

The following metrics are averaged over the N agents in the "best world".

### Final Displacement Error (FDE)

For each agent $n \in \{1,\dots,N\}$ with predicted centroid positions $\mathbf{c}^{\,n}_{\text{pred},t}\in\mathbb{R}^3$ and ground-truth centroids $\mathbf{c}^{\,n}_{\text{gt},t}$ over a horizon of $T$ future steps, the per-agent
final displacement error is the Euclidean ($L_2$) distance between
the two endpoints:

$$
\text{FDE}^{\,n} \;=\;
\left\| \mathbf{c}^{\,n}_{\text{pred},T}\;-\;\mathbf{c}^{\,n}_{\text{gt},T} \right\|_{2}.
$$

The scene-level FDE is the mean over all agents:  
$$
\displaystyle \text{FDE} = \frac{1}{N}\sum_{n=1}^{N}\text{FDE}^{\,n} 
$$.

---

### Average Displacement Error (ADE)

The average displacement error measures the mean spatial deviation over
the whole trajectory:

$$
\text{ADE}^{\,n} \;=\;
\frac{1}{T}\sum_{t=1}^{T}
\left\| \mathbf{c}^{\,n}_{\text{pred},t}\;-\;\mathbf{c}^{\,n}_{\text{gt},t} \right\|_{2}.
$$

Aggregated over all agents:  
$$
\displaystyle \text{ADE} = \frac{1}{N}\sum_{n=1}^{N}\text{ADE}^{\,n}
$$.

---

### Rotation Error (RE)

Let the Euler angles yaw–pitch–roll for agent $n$ at step $t$ be $\boldsymbol\theta^{\,n}_{\text{gt},t} = (\psi,\;\vartheta,\;\varphi)$ (ground truth) and $\boldsymbol\theta^{\,n}_{\text{pred},t}$ (prediction).

Represent each individual angle $\alpha$ by the 2-D unit vector $\mathbf{v}(\alpha) = (\cos\alpha,\;\sin\alpha)$.
The angular difference for a single axis is
$\displaystyle \delta(\alpha) = \arccos\!\bigl(\mathbf{v}(\alpha_{\text{gt}}) \!\cdot\! \mathbf{v}(\alpha_{\text{pred}})\bigr)$.

The per-agent rotation error, averaged over all three axes and all steps, is

$$
\text{RE}^{\,n}
= \frac{1}{3T}\!
\sum_{t=1}^{T}\!
\sum_{k\in\{\psi,\vartheta,\varphi\}}
\delta\!\bigl(k^{\,n}_{t}\bigr).
$$

Averaged over all agents in the scene:  
$$
\displaystyle \text{RE} = \frac{1}{N}\sum_{n=1}^{N}\text{RE}^{\,n}
$$.

---

### Velocity Heading Shift (VHS)

At time step $t>1$, define the predicted planar velocity vector

$$
\mathbf{v}^{\,n}_{t} = 
\bigl(x^{\,n}_{\text{pred},t} - x^{\,n}_{\text{pred},t-1},\;
      y^{\,n}_{\text{pred},t} - y^{\,n}_{\text{pred},t-1}\bigr),
$$

and the predicted heading direction as the unit vector from the yaw
angle:  
$$
\displaystyle \mathbf{h}^{\,n}_{t}= (\cos\psi^{\,n}_{\text{pred},t},\;\sin\psi^{\,n}_{\text{pred},t})
$$.

The instantaneous angular mismatch is

$$
\delta^{\,n}_{t} = \arccos\!\left(
\frac{\mathbf{v}^{\,n}_{t}\!\cdot\!\mathbf{h}^{\,n}_{t}}
     {\|\mathbf{v}^{\,n}_{t}\|_2}
\right).
$$

The per-agent VHS, averaged over the trajectory, is

$$
\text{VHS}^{\,n} = \frac{1}{T-1}\sum_{t=2}^{T}\delta^{\,n}_{t}.
$$

Overall VHS:  
$$
\displaystyle \text{VHS} = \frac{1}{N}\sum_{n=1}^{N}\text{VHS}^{\,n}
$$.

Low VHS values indicate that the predicted velocity aligns well with the
predicted heading, as expected for non-drifting vehicles.

---

## **Evaluation script `eval_forecast.py`**

#### Command-Line Arguments

From the repo's root directory:

```bash
python eval/eval_forecast.py \
    -H /path/to/history_files \
    -F /path/to/forecast_files \
    -P /path/to/prediction_files \
    -O /path/to/output_dir \
    -M /path/to/map_bounds_files \
    [--save_add]
```
| Argument              | Description                                                                                           |
|-----------------------|---------------------------------------------------------------------------------------------------|
| `-H`, `--history_files`   | Path to the directory containing historical input data files.                                      |
| `-F`, `--forecast_files`  | Path to the directory containing ground-truth forecast data files.                                |
| `-P`, `--prediction_files`| Path to the directory containing predicted forecast data. Can include multiple predictions (e.g., for K > 1). |
| `-O`, `--output_dir`       | Path to the directory where the evaluation summary JSON file will be saved.                       |
| `-M`, `--map_files`       | Path to the directory containing map boundaries files (optional: if not set, don't compute Out-of-Map Rate)                       |
| `--save_add`       | Set to save a plot of the per-scene ADD score (for outlier detection)                       |

#### Output Format

The script generates two JSON files (containing mean/median metrics over the set) saved in the specified `--output_dir`. This file contains all computed metrics, organized by **motion category**:

- **static**: Agents that remain stationary during the forecast horizon.
- **linear**: Agents whose motion can be approximated by constant velocity, where the average deviation from this approximation is smaller than the average distance traveled per time step.
- **nonlinear**: Agents exhibiting more complex, non-linear motion patterns.

Each metric (e.g., FDE, ADE, Rotation Error, Velocity Heading Shift) is reported separately for each category, allowing detailed analysis of model performance across different agent behaviors, as well as for the full set of instances.
