BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot DetectionDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 06 Nov 2023CVPR Workshops 2023Readers: Everyone
Abstract: This work proposes a data driven learning model for the synthesis of keystroke biometric data. The proposed method is compared with two statistical approaches based on Universal and User-dependent models. These approaches are validated on a bot detection task, using the keystroke synthetic data to improve the training process of keystroke-based bot detection systems. Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects. We have analyzed the performance of the three synthesis approaches through qualitative and quantitative experiments. Different bot detectors are considered based on several supervised classifiers (Support Vector Machine, Random Forest, Gaussian Naive Bayes and a Long Short-Term Memory network) and a learning framework including human and synthetic samples. The experiments demonstrate the realism of the synthetic samples. The classification results suggest that in scenarios with large labeled data, these synthetic samples can be detected with high accuracy. However, if the proposed synthetic data is nor properly modelled using massive data by bot detectors, then that data will be very difficult to detect even for the most sophisticate bot detectors. Furthermore, these results show the great potential of the presented models for improving the training of bot detection technology.
0 Replies

Loading