AdFlush: A Real-World Deployable Machine Learning Solution for Effective Advertisement and Web Tracker Prevention

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24EveryoneRevisionsBibTeX
Keywords: Ad blocking, Web tracking, Machine learning, Deployability, Web security
TL;DR: We introduce AdFlush, a machine learning solution that efficiently blocks ads and web trackers. Unlike previous approaches that focused only on detection accuracy, AdFlush is designed to be practical and deployable in real-world web browsers.
Abstract: Ad blocking and web tracking prevention tools are widely used, but traditional filter list-based methods struggle to cope with web content manipulation. Machine learning-based approaches have been proposed to address these limitations, but they have primarily focused on improving detection accuracy at the expense of practical considerations such as deployment overhead. In this paper, we present AdFlush, a lightweight machine learning model for ad blocking and web tracking prevention that is practically designed for the Chrome browser. To develop AdFlush, we first evaluated the effectiveness of 883 features, including 350 existing and 533 new features, and ultimately identified 27 key features that achieve optimal detection performance. We then evaluated AdFlush using a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, which outperforms state-of-the-art models such as AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Importantly, AdFlush also exhibits a significantly reduced computational footprint, requiring 56% less CPU and 80% less memory than AdGraph. We also evaluated the robustness of AdFlush against adversarial manipulation, such as URL manipulation and JavaScript obfuscation. Our experimental results show that AdFlush exhibits superior robustness with F1 scores of 0.89–0.98, outperforming AdGraph and WebGraph, which achieved F1 scores of 0.81–0.87 against adversarial samples. To demonstrate the real-world applicability of AdFlush, we have implemented it as a Chrome browser extension and made it publicly available. We also conducted a six-month longitudinal study, which showed that AdFlush maintained a high F1 score above 0.97 without retraining, demonstrating its effectiveness. Additionally, AdFlush detected 642 URLs across 108 domains that were missed by commercial filter lists, which we reported to filter list providers.
Track: Security
Submission Guidelines Scope: Yes
Submission Guidelines Blind: Yes
Submission Guidelines Format: Yes
Submission Guidelines Limit: Yes
Submission Guidelines Authorship: Yes
Student Author: Yes
Submission Number: 2363
Loading