ProvAudit: Enhance High-Level Privacy Inference Through System Provenance Data

Published: 2025, Last Modified: 21 Jan 2026IEEE Trans. Dependable Secur. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Companies such as CrowdStrike now offer cloud-based services for provenance analysis, which collects low-level system events from a customer's device and compiles them onto a centralized platform to detect APT attacks. Despite the effectiveness of such solutions, their privacy implications remain unclear. To assess the privacy implications of system provenance analysis, we employ the Website Fingerprinting (WF) of The Onion Router (Tor) browsers as the real-world attack scenario. In contrast to conventional, network traffic-based WF techniques, we have designed ProvAudit, a fully automated solution that audits the web browsing history of Tor browsers based on system provenance data. We conduct the first systematic case study to demonstrate the feasibility of inferring the websites visited by Tor browsers solely based on the collected system provenance data, particularly system call traces. The evaluation results show that our approach achieves a precision of 0.74 in the open-world scenario, higher than the state-of-the-art robust WF technique. In practice, ProvAudit consumes approximately 23 MB of memory and 4% CPU to audit system provenance data. Our approach is more robust against simple adversarial methods, more accurate, and less expensive than existing solutions. Overall, our case study reveals that provenance data is susceptible to privacy breaches, potentially exposing more high-level information than anticipated.
Loading