On LLM Embeddings for Vulnerability Management

Rustam Talibzade, Francesco Bergadano, Idilio Drago

Published: 2025, Last Modified: 22 Apr 2026TMA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Effective vulnerability management is fundamental for cybersecurity, requiring significant manual effort to identify, classify, and prioritize reports. Automating this process could reduce analyst workload and improve response to threats. We explore the use of Large Language Models (LLMs) for representing and analyzing Common Vulnerabilities and Exposures (CVEs), specifically their ability to generate semantic embeddings that capture the nature of vulnerabilities from textual descriptions. Using a dataset of 4,710 CVEs, we generate vector embeddings with multiple LLMs. We then apply both unsupervised clustering and supervised classification to evaluate the quality of the embeddings. Our preliminary results show that some LLMs – in particular Llama 3.2 – map similar vulnerabilities together in the embedding space. Embeddings seen to position related vulnerabilities in nearby regions, with certain clusters showing strong correspondence to specific categories like SQL Injection (CWE-89) and Path Traversal (CWE-22). A simple KNN classifier using only the embeddings achieves around 50% accuracy when categorizing CVEs. This is a remarkable result, considering the very high number of classes in the problem. Our initial findings show potential for the use of LLMs in vulnerability management processes, calling for deeper analysis on how to better learn representations from vulnerability reports.
Loading