GenTraceBench: A Benchmark for Tracing Audio Deepfakes Across Pre- and Post-training Stages

ACL ARR 2026 January Submission5319 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: audio deepfake, text-to-speech, detection, attribution, benchmark
Abstract: Modern text-to-speech (TTS) models increasingly rely on foundation pretraining followed by post-training adaptation, creating new challenges for audio deepfake detection and attribution in the wild. Prior benchmarks mainly test against fixed generators and thus under-estimate the impact of adaptation-induced shifts. We present GenTrace, a benchmark that tracks TTS evolution from foundation pretraining to diverse adaptation strategies, with controlled prompts and speakers to isolate model-induced differences (16 variants, 49,728 synthesized utterances). Using GenTrace, we find that alignment-based adaptation typically preserves detection accuracy, while architecture and pretraining data have a substantially larger effect on attribution performance. GenTrace supports reproducible evaluation of detection and attribution robustness under realistic model adaptation scenarios. GenTrace will be publicly released upon acceptance.
Paper Type: Short
Research Area: Speech Processing and Spoken Language Understanding
Research Area Keywords: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Contribution Types: Data resources, Data analysis
Languages Studied: English, Chinese
Submission Number: 5319
Loading