# Dedup Policy

## DEDUP

| Key | Value |
| --- | --- |
| dedup_mode | normalized |
| normalize_text_en | strip + collapse spaces (+ optional lowercase) |
| normalize_text_zh | strip + collapse spaces |
| provenance_fields_kept | source_sample_ids, source_groups, source_dataset |

