# Tempora

Supplementary code for "Tempora: Characterising the Time-Contingent Utility of Fully Online Test-Time Adaptation" (ICML 2026 submission).

## Overview

Tempora is a framework for evaluating test-time adaptation (TTA) under temporal constraints. It comprises temporal scenarios, evaluation protocols, and time-contingent utility metrics. We instantiate this framework with three metrics for distinct scenarios:

1. **Discrete utility**: Asynchronous streams with hard deadlines; batches arriving while the pipeline is occupied are skipped.
2. **Continuous utility**: User-led pacing with hyperbolic decay; late predictions lose value proportional to response delay.
3. **Amortised utility**: Budget-constrained overhead; adaptation proceeds until the budget is exhausted, then the model freezes.

These metrics reveal rank instability: the method deemed optimal under offline evaluation frequently underperforms under temporal pressure.

## Repository Structure
```
tempora-supplementary/
├── tempora/
│   ├── common/
│   │   ├── methods/       # TTA method implementations
│   │   ├── models/        # Model architectures
│   │   ├── datasets/      # Dataset loaders
│   │   └── recorders/     # Utility recorders
│   └── scripts/
│       └── evaluate/      # Evaluation scripts
├── output/                # Evaluation logs (JSON)
│   ├── offline/
│   ├── discrete/          # Varying utilisation ρ
│   ├── continuous/        # Varying threshold T
│   └── amortised/         # Varying budget B
└── pyproject.toml         # Dependencies
```

## Methods

We evaluate seven Fully TTA methods spanning two families:

| Method | Family | Mechanism |
|--------|--------|-----------|
| AdaBN | Gradient-free | Recomputes BN statistics on incoming batch |
| LAME | Gradient-free | Refines outputs via Laplacian-regularised optimisation |
| NEO | Gradient-free | Centres features by subtracting running mean |
| Tent | Gradient-based | Minimises entropy over BN affine parameters |
| ETA | Gradient-based | Tent with reliability and redundancy filtering |
| SHOT-IM | Gradient-based | Entropy minimisation with diversity regularisation; updates full backbone |
| SAR | Gradient-based | Sharpness-aware entropy minimisation with filtering and reset |

Standard inference (no adaptation) serves as the baseline.

Hyperparameters follow original implementations and are configured in `tempora/common/utils.py`.

## Usage

### Requirements

- Python 3.12+
- [uv](https://docs.astral.sh/uv/) package manager
- [ImageNet-C](https://github.com/hendrycks/robustness)

Install dependencies:
```bash
uv sync
```

### Running Evaluations
```bash
# Offline evaluation
uv run python -m tempora.scripts.evaluate.offline \
    --method eta \
    --dataset-name imagenet \
    --dataset-root /path/to/datasets \
    --dataset-dist noise blur weather digital \
    --output-dir output/offline

# Discrete utility (utilisation ρ controlled via interval)
uv run python -m tempora.scripts.evaluate.discrete \
    --method eta \
    --interval 39.9 \
    --queue-size 1 \
    --output-dir output/discrete

# Amortised utility (budget B in ms)
uv run python -m tempora.scripts.evaluate.amortised \
    --method eta \
    --overhead-budget 1000 \
    --output-dir output/amortised
```

### Log Format

Each evaluation produces a JSON file containing:
- `arguments`: Evaluation configuration
- `results`: Per-batch metrics (predictions, timing, accuracy)

## Scope

This release accompanies our ICML submission. Reported results use ImageNet-C (severity 5) with ResNet-50 and batch size 64, evaluated across:

- **Discrete**: utilisation ρ ∈ {100, 70, 50, 35, 25}% (intervals γ ∈ {39.9, 56.4, 79.8, 112.8, 159.6} ms)
- **Continuous**: threshold T ∈ {50, 100, 200, 400, 1000} ms
- **Amortised**: budget B ∈ {1, 2, 4, 8, 16, 32} s

The framework supports additional architectures and datasets for extensibility.

The codebase includes additional utilities from preliminary experiments that are not required to reproduce the reported results.

## License

MIT License. See LICENSE for details.