Active Learning Image Spam Hunter

Yan Gao, Alok N. Choudhary

Published: 2009, Last Modified: 21 Jan 2026ISVC (2) 2009EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Image spam is annoying email users around the world. Most previous work for image spam detection focuses on supervised learning approaches. However, it is costly to get enough trustworthy labels for learning, especially for an adversarial problem where spammers constantly modify patterns to evade the classifier. To address this issue, we employ the principle of active learning where the learner guides the user to label as few images as possible while maximizing the classification accuracy. Active learning is more suited for online image spam filtering since it dramatically reduces the labeling costs with negligible overhead while maintaining high recognition performance. We present and compare two active learning algorithms, based on an SVM and a Gaussian process classifier respectively. To the best of our knowledge, we are the first to apply active learning for the task of spam image filtering. Experimental results demonstrate that our active learning based approaches quickly achieve > 99% high detection rate and < 0.5% low false positive rate with small number of images being labeled.

External IDs:dblp:conf/isvc/GaoC09