Reducing the Cost of Breaking Audio CAPTCHAs by Active and Semi-supervised Learning

Malte Darnstädt; Hendrik Meutzner; Dorothea Kolossa

Reducing the Cost of Breaking Audio CAPTCHAs by Active and Semi-supervised Learning

Malte Darnstädt, Hendrik Meutzner, Dorothea Kolossa

Published: 01 Jan 2014, Last Modified: 19 Jun 2024ICMLA 2014EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: CAPTCHAs are challenge-response tests that are widely used in the Internet to distinguish human users from machines. In addition to the well-known visual CAPTCHAs, most Internet services also provide an audio-based scheme, e.g., To enable access for visually impaired users. Recent research has shown that most CAPTCHAs are vulnerable as they can be broken by machine learning techniques. However, such automated attacks come at a relatively high cost as they require human experts to create labels for the unlabeled CAPTCHA samples collected from a website in order to train an attacking system. In this work we utilize active and semi-supervised learning methods for breaking audio CAPTCHAs. We show that these methods can reduce the labeling costs considerably, resulting in an increased vulnerability of audio CAPTCHAs as automated attacks are rendered even more worthwhile. In addition, our findings give insight into improvements to the design of CAPTCHAs, helping to harden prospective audio CAPTCHA schemes against active learning attacks in the future.

Loading