<p align="center">
  <img src="assets/d2p.png" width="200"><br>
  <strong>Differentiable Dynamic Programming for PyTorch</strong><br>
  GPU-accelerated, fully differentiable implementations of classic dynamic programming algorithms.
</p>

<p align="center">
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.9+-blue.svg" alt="Python 3.9+"></a>
  <a href="https://pytorch.org/"><img src="https://img.shields.io/badge/pytorch-2.0+-orange.svg" alt="PyTorch 2.0+"></a>
  <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License: MIT"></a>
</p>

---

## Installation

```bash
pip install d2p
```

### Building from Source

Requires `nvcc` (CUDA compiler) in PATH. Works with either:
- **System CUDA**: Install [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit) from NVIDIA
- **Conda CUDA**: `conda install cuda-nvcc` or `conda install cuda-toolkit`

```bash
git clone https://anonymous.4open.science/r/d2p-main.git
cd d2p
pip install -e .
```

CUDA libraries are linked against PyTorch's bundled versions, so only `nvcc` is needed for compilation.

---

## Quick Start

```python
import torch
import d2p

# Create a batch of score matrices
scores = torch.randn(2, 100, 120, device='cuda', requires_grad=True)

# Compute soft Smith-Waterman alignment
result = d2p.soft_sw(scores, gap=-1.0, temperature=1.0)

# Access results
print(result.value.shape)      # [2] - log partition function
print(result.marginals.shape)  # [2, 100, 120] - soft alignment matrix

# Gradients flow through everything
loss = result.value.sum()
loss.backward()
print(scores.grad.shape)       # [2, 100, 120]
```

---

## Algorithms

d2p provides 13 differentiable dynamic programming operators:

| Algorithm | Function | Description | Input Shape |
|-----------|----------|-------------|-------------|
| Smith-Waterman | `soft_sw` | Local sequence alignment | `[B, L1, L2]` |
| Smith-Waterman (Affine) | `soft_sw_affine` | Local alignment with affine gaps | `[B, L1, L2]` |
| Needleman-Wunsch | `soft_nw` | Global sequence alignment | `[B, L1, L2]` |
| Needleman-Wunsch (Affine) | `soft_nw_affine` | Global alignment with affine gaps | `[B, L1, L2]` |
| DTW | `soft_dtw` | Dynamic time warping | `[B, L1, L2]` |
| CKY | `soft_cky` | Context-free grammar parsing | `[B, N, N, N]` + `[B, N]` |
| MAS | `soft_mas` | Monotonic alignment search | `[B, T, S]` |
| Eisner | `soft_eisner` | Projective dependency parsing | `[B, N, N]` |
| Levenshtein | `soft_levenshtein` | Edit distance | `[B, L1, L2]` |
| LCS | `soft_lcs` | Longest common subsequence | `[B, L1, L2]` |
| OSA | `soft_osa` | Optimal string alignment | `[B, L1, L2]` |
| Damerau-Levenshtein | `soft_damerau` | Edit distance with transpositions | `[B, L1, L2]` |
| Hamming | `soft_hamming` | Position-wise distance | `[B, L]` |

---

## API Reference

### Sequence Alignment

#### Smith-Waterman (Local Alignment)

Finds the best local alignment between two sequences. Use when you want to find similar subsequences.

```python
d2p.soft_sw(
    scores,                    # [B, L1, L2] similarity matrix
    gap=-1.0,                  # gap penalty (negative)
    temperature=1.0,           # softmax temperature
    lengths=None,              # optional [B, 2] sequence lengths
    mask1=None, mask2=None     # optional boolean masks
) -> SWResult
```

**Returns** `SWResult`:
- `value` / `score`: `[B]` - log partition function
- `marginals` / `alignment`: `[B, L1, L2]` - soft alignment probabilities

```python
# Basic usage
scores = torch.randn(2, 100, 120, device='cuda', requires_grad=True)
result = d2p.soft_sw(scores, gap=-1.0, temperature=1.0)

# Affine gap penalties (separate open/extend costs)
result = d2p.soft_sw_affine(
    scores,
    gap_open=-2.0,    # cost to open a gap
    gap_ext=-0.5,     # cost to extend a gap
    temperature=1.0
)
```

#### Needleman-Wunsch (Global Alignment)

Aligns entire sequences end-to-end. Use for full sequence comparison.

```python
d2p.soft_nw(
    scores,                    # [B, L1, L2] similarity matrix
    gap=-1.0,                  # gap penalty
    temperature=1.0,
    lengths=None
) -> NWResult
```

**Returns** `NWResult`:
- `score`: `[B]` - log partition function
- `alignment`: `[B, L1, L2]` - soft alignment probabilities

```python
# Global alignment
result = d2p.soft_nw(scores, gap=-1.0, temperature=1.0)

# With affine gaps
result = d2p.soft_nw_affine(scores, gap_open=-2.0, gap_ext=-0.5)
```

**Linear vs Affine Gap Penalties:**
- **Linear**: Each gap position costs the same (`gap`)
- **Affine**: Opening a gap costs `gap_open`, extending costs `gap_ext`. Better models biological sequences where gaps tend to cluster.

---

### Time Series Alignment

#### Dynamic Time Warping (DTW)

Aligns time series with different speeds. Allows many-to-one matching.

```python
d2p.soft_dtw(
    costs,                     # [B, L1, L2] cost matrix (lower = better)
    temperature=1.0,
    bandwidth=0,               # Sakoe-Chiba band (0 = no constraint)
    lengths=None
) -> DTWResult
```

**Returns** `DTWResult`:
- `cost`: `[B]` - soft DTW distance
- `alignment`: `[B, L1, L2]` - soft alignment path

```python
# Compare two time series
# costs[b, i, j] = distance between series1[i] and series2[j]
costs = torch.cdist(series1, series2)  # [B, L1, L2]

result = d2p.soft_dtw(costs, temperature=1.0)

# With Sakoe-Chiba band constraint (limits warping)
result = d2p.soft_dtw(costs, temperature=1.0, bandwidth=10)
```

---

### Parsing Algorithms

#### CKY (Context-Free Grammar Parsing)

Parses sentences according to a context-free grammar. Returns span marginals.

```python
d2p.soft_cky(
    merge_scores,              # [B, N, N, N] merge scores
    leaf_scores,               # [B, N] terminal scores
    temperature=1.0
) -> CKYResult
```

**Input shapes:**
- `merge_scores[b, i, k, j]`: score for creating span (i, j) by merging (i, k) and (k+1, j)
- `leaf_scores[b, i]`: score for position i as a terminal

**Returns** `CKYResult`:
- `score`: `[B]` - log partition function
- `marginals`: `[B, N, N]` - span marginal probabilities

```python
N = 8  # sequence length
merge_scores = torch.randn(2, N, N, N, device='cuda', requires_grad=True)
leaf_scores = torch.randn(2, N, device='cuda', requires_grad=True)

result = d2p.soft_cky(merge_scores, leaf_scores, temperature=1.0)
# result.marginals[b, i, j] = probability that span (i,j) is in the parse
```

#### Eisner (Dependency Parsing)

Finds projective dependency trees. Used in syntactic parsing.

```python
d2p.soft_eisner(
    arc_scores,                # [B, N, N] arc scores (head -> dependent)
    temperature=1.0,
    lengths=None               # optional [B] sentence lengths
) -> EisnerResult
```

**Input:**
- `arc_scores[b, h, d]`: score for arc from head h to dependent d
- Position 0 is typically ROOT

**Returns** `EisnerResult`:
- `score`: `[B]` - log partition function
- `marginals`: `[B, N, N]` - arc marginal probabilities

```python
# 10 words + ROOT = 11 positions
arc_scores = torch.randn(2, 11, 11, device='cuda', requires_grad=True)

result = d2p.soft_eisner(arc_scores, temperature=1.0)
# result.marginals[b, h, d] = probability of arc h -> d
```

---

### Speech & TTS Alignment

#### Monotonic Alignment Search (MAS)

Finds monotonic alignments between text and audio. Used in text-to-speech.

```python
d2p.soft_mas(
    scores,                    # [B, T, S] alignment scores
    temperature=1.0,
    lengths=None               # optional [B, 2] for (T_len, S_len)
) -> MASResult
```

**Constraint:** T >= S (more frames than text tokens)

**Returns** `MASResult`:
- `score`: `[B]` - log partition function
- `alignment`: `[B, T, S]` - soft monotonic alignment

```python
# TTS: align 10 text tokens to 50 mel frames
scores = torch.randn(2, 50, 10, device='cuda', requires_grad=True)

result = d2p.soft_mas(scores, temperature=1.0)
# result.alignment shows which frames map to which tokens
```

---

### Edit Distance Algorithms

#### Levenshtein Distance

Standard edit distance with insert, delete, and substitute operations.

```python
d2p.soft_levenshtein(
    sub_costs,                 # [B, L1, L2] substitution costs
    ins_cost=1.0,              # insertion cost
    del_cost=1.0,              # deletion cost
    temperature=1.0,
    lengths=None
) -> LevenshteinResult
```

**Returns** `LevenshteinResult`:
- `distance`: `[B]` - soft edit distance
- `alignment`: `[B, L1, L2]` - edit operation probabilities

```python
# sub_costs[b, i, j] = cost to substitute seq1[i] with seq2[j]
# Use 0 for matching characters, positive for mismatches
sub_costs = (seq1.unsqueeze(-1) != seq2.unsqueeze(-2)).float()

result = d2p.soft_levenshtein(sub_costs, ins_cost=1.0, del_cost=1.0)
```

#### Longest Common Subsequence (LCS)

Finds the longest subsequence common to both sequences (not necessarily contiguous).

```python
d2p.soft_lcs(
    match_scores,              # [B, L1, L2] match scores (higher = better)
    temperature=1.0,
    lengths=None
) -> LCSResult
```

```python
# match_scores[b, i, j] = reward for matching seq1[i] with seq2[j]
match_scores = (seq1.unsqueeze(-1) == seq2.unsqueeze(-2)).float()

result = d2p.soft_lcs(match_scores, temperature=1.0)
```

#### Optimal String Alignment (OSA)

Levenshtein with adjacent transpositions (restricted Damerau-Levenshtein).

```python
d2p.soft_osa(
    sub_costs,                 # [B, L1, L2] substitution costs
    trans_mask,                # [B, L1-1, L2-1] where transposition is valid
    ins_cost=1.0,
    del_cost=1.0,
    trans_cost=1.0,
    temperature=1.0,
    lengths=None
) -> OSAResult
```

#### Damerau-Levenshtein

True Damerau-Levenshtein with unrestricted transpositions.

```python
d2p.soft_damerau(
    sub_costs,                 # [B, L1, L2] substitution costs
    trans_src,                 # [B, L1, L2] transposition source indices
    ins_cost=1.0,
    del_cost=1.0,
    trans_cost=1.0,
    temperature=1.0,
    lengths=None
) -> DamerauResult
```

#### Hamming Distance

Position-wise comparison for equal-length sequences.

```python
d2p.soft_hamming(
    costs,                     # [B, L] mismatch costs per position
    temperature=1.0,
    lengths=None
) -> HammingResult
```

```python
costs = (seq1 != seq2).float()  # 1 for mismatch, 0 for match
result = d2p.soft_hamming(costs)
```

---

## Features

### Full Differentiability

All operators support gradients through:
- Input scores/costs
- Gap penalties (gap, gap_open, gap_ext)
- Temperature parameter
- All operation costs

**Learnable parameters:**

```python
# Pass tensors instead of floats for learnable parameters
gap = torch.tensor([-1.0], device='cuda', requires_grad=True)
temp = torch.tensor([1.0], device='cuda', requires_grad=True)

result = d2p.soft_sw(scores, gap=gap, temperature=temp)
loss = result.value.sum()
loss.backward()

print(gap.grad)   # gradient w.r.t. gap penalty
print(temp.grad)  # gradient w.r.t. temperature
```

### torch.compile Support

All operators are compatible with PyTorch 2.0+ compilation:

```python
@torch.compile
def align(scores):
    return d2p.soft_sw(scores, gap=-1.0, temperature=1.0)

result = align(scores)  # Compiled execution
```

### Variable-Length Batching

Handle sequences of different lengths efficiently:

```python
# Using lengths tensor
lengths = torch.tensor([
    [50, 60],   # batch 0: seq1=50, seq2=60
    [80, 100],  # batch 1: seq1=80, seq2=100
], dtype=torch.int32, device='cuda')

result = d2p.soft_sw(scores, gap=-1.0, lengths=lengths)

# Using boolean masks
mask1 = torch.tensor([[True]*50 + [False]*50] * 2)  # [B, L1]
mask2 = torch.tensor([[True]*60 + [False]*60] * 2)  # [B, L2]

result = d2p.soft_sw(scores, gap=-1.0, mask1=mask1, mask2=mask2)
```

### Module API

Use nn.Module wrappers for integration with neural networks:

```python
import torch.nn as nn
import d2p

class AlignmentModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Linear(64, 64)
        self.sw = d2p.SoftSW(
            gap=-1.0,
            temperature=1.0,
            learn_gap=True,        # make gap learnable
            learn_temperature=True  # make temperature learnable
        )

    def forward(self, x):
        scores = self.encoder(x)
        return self.sw(scores)

model = AlignmentModel()
# model.sw.gap and model.sw.temperature are nn.Parameters
```

**Available modules:**
- `SoftSW`, `SoftSWAffine`
- `SoftNW`, `SoftNWAffine`
- `SoftDTW`
- `SoftCKY`
- `SoftMAS`
- `SoftEisner`
- `SoftLevenshtein`, `SoftLCS`, `SoftOSA`, `SoftDamerau`, `SoftHamming`

### Mixed Precision (AMP)

Works seamlessly with automatic mixed precision:

```python
with torch.autocast('cuda', dtype=torch.float16):
    result = d2p.soft_sw(scores, gap=-1.0)
    # Operations automatically use FP32 for numerical stability
```

---

## Temperature Parameter

The temperature parameter controls the "softness" of the alignment:

| Temperature | Behavior |
|-------------|----------|
| T → 0 | Hard alignment (argmax) |
| T = 1 | Standard soft alignment |
| T → ∞ | Uniform distribution |

```python
# Low temperature: sharp, nearly deterministic
result_hard = d2p.soft_sw(scores, gap=-1.0, temperature=0.1)

# High temperature: smooth, more uniform
result_soft = d2p.soft_sw(scores, gap=-1.0, temperature=10.0)
```

Lower temperatures give sharper gradients but may have numerical issues. Higher temperatures give smoother optimization landscapes.

---

## API Levels

d2p provides three API levels:

### 1. High-Level API (Recommended)

Simple function calls with sensible defaults:

```python
import d2p

result = d2p.soft_sw(scores, gap=-1.0, temperature=1.0)
```

### 2. Module API

nn.Module wrappers for neural network integration:

```python
sw_layer = d2p.SoftSW(gap=-1.0, learn_gap=True)
result = sw_layer(scores)
```

### 3. Low-Level API

Direct access to underlying operators via `d2p.ops`:

```python
from d2p import ops

# Individual operations
score, alignment = ops.soft_sw_float(scores, gap, temp, lengths)

# Forward with parameter gradients
score, align, grad_gap, grad_temp = ops.soft_sw_with_grads(...)

# Hessian-vector product
hvp = ops.soft_sw_hvp(scores, tangent, gap, temp, lengths)
```

---

## Performance

### Complexity

| Algorithm | Time | Space |
|-----------|------|-------|
| SW, NW, DTW | O(L1 × L2) | O(L1 × L2) |
| Levenshtein, LCS | O(L1 × L2) | O(L1 × L2) |
| CKY | O(N³) | O(N²) |
| Eisner | O(N³) | O(N²) |
| Hamming | O(L) | O(L) |


## License

MIT License - see [LICENSE](LICENSE) for details.
