A Practical Guide to Logical Access Voice Presentation Attack Detection

Published: 01 Jan 2022, Last Modified: 19 Oct 2024CoRR 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Voice-based human-machine interfaces with an automatic speaker verification (ASV) component are commonly used in the market. However, the threat from presentation attacks is also growing since attackers can use recent speech synthesis technology to produce a natural-sounding voice of a victim. Presentation attack detection (PAD) for ASV, or speech anti-spoofing, is therefore indispensable. Research on voice PAD has seen significant progress since the early 2010s, including the advancement in PAD models, benchmark datasets, and evaluation campaigns. This chapter presents a practical guide to the field of voice PAD, with a focus on logical access attacks using text-to-speech and voice conversion algorithms and spoofing countermeasures based on artifact detection. It introduces the basic concept of voice PAD, explains the common techniques, and provides an experimental study using recent methods on a benchmark dataset. Code for the experiments is open-sourced.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview