Applying Machine Learning to use security oracles: a case study in virus and malware detection

Davy Preuveneers, Emma Lavens, Wouter Joosen

2022 (modified: 02 Sept 2022)EuroS&P Workshops 2022Readers: Everyone

Abstract: Machine Learning (ML) has a significant potential to enhance the security posture of an organization by improving threat detection and discovery. The growing quality and quantity of data through measurements creates opportunities in this context. However, when an organization does not have sufficient labeled data to make predictions, it can rely on third parties for expert advise. In this work, we present a real-world case study of a company offering security services to other businesses on top of its portfolio of internal security assets. The security provider adopts ML to learn when to cost-effectively invoke a third party service/oracle (i.e. VirusTotal), first to boost its own detection rate in order to protect its customers, and second to obtain new ground truth to further optimize its future oracle use, all under fixed query budget constraints. While the decision making of the ML system was very successful in terms of avoiding false positives and false negatives, we nonetheless identified several security challenges when adopting ML for a cost-effective use of security oracles. We evaluate our ML solution on real data, elicit various lessons learned about the increased attack surface when using ML in security applications, and identify countermeasures to secure this ML pipeline.

0 Replies