# Supplementary Code for *StreamVGGT*

> **Note:** This repository contains a *minimal* subset of our training and inference pipeline.  The complete source code will be made publicly available after the paper is accepted and published. 

---


| Module Category | File(s) | Brief Description |
|-----------------|---------|-------------------|
| **Multi‑Task Heads** | `camera_head.py` | Predicts 9‑DoF camera parameters (translation, quaternion, fov) through iterative refinement.  |
|                 | `dpt_head.py`       | Dense depth / 3D point head built on top of DPT‑style multi‑scale fusion.  |
|                 | `track_head.py`     | Lightweight point‑tracking head that re‑uses DPT features and refines tracks over multiple iterations.  |
| **Spatio‑Temporal Attention & Cached Token Memory** | `attention.py` | Attention with optional RoPE and cached token memory.  |
|                 | `block.py`          | Transformer block wrapper (LayerScale + stochastic depth) used by both spatial and temporal attention stacks. |
|                 | `aggregator.py`     | Alternates *spatial* and *temporal* attention, slices special tokens, and manages cross‑frame token cache. |
| **Model Forward / Inference** | `stream3r.py` | High‑level wrapper that wires the aggregator and heads, offering both forward pass and online autoregressive inference.|

