Abstract: Client-side cross-site scripting (DOM XSS) vulnerabilities in web applications are common, hard to identify, and difficult to prevent. Taint tracking is the most promising approach for detecting DOM XSS with high precision and recall, but is too computationally expensive for many practical uses. We investigate whether machine learning (ML) classifiers can replace or augment taint tracking when detecting DOM XSS vulnerabilities. Through a large-scale web crawl, we collect over 18 billion JavaScript functions and use taint tracking to label over 180,000 functions as potentially vulnerable. With this data, we train a deep neural network (DNN) to analyze a JavaScript function and predict if it is vulnerable to DOM XSS. We experiment with a range of hyperparameters and present a low-latency, high-recall classifier that could serve as a pre-filter to taint tracking, reducing the cost of stand-alone taint tracking by 3.43 × while detecting 94.5% of unique vulnerabilities. We argue that this combination of a DNN and taint tracking is efficient enough for a range of use cases for which taint tracking by itself is not, including in-browser run-time DOM XSS detection and analyzing large codebases.
0 Replies
Loading