# CVE Reproduction Value Assessment Guide

This document describes how to evaluate whether a CVE is worth investing resources to reproduce when building a CVE vulnerability reproduction library. These standards were developed through extensive CVE analysis practice, with the core principle being **evaluating the intrinsic value of the vulnerability itself, not surface-level characteristics**.

## Core Philosophy

The most important factor in determining whether a CVE is worth reproducing is understanding what value it brings to the vulnerability library. A good vulnerability library should cover diverse vulnerability types, attack patterns, and product categories—not simply accumulate quantity. Therefore, our evaluation criteria always revolve around two dimensions: **uniqueness** and **reproducibility**.

**Uniqueness** refers to whether the vulnerability brings new value to the library—whether through a new CWE type, a new attack pattern, or a new product category. **Reproducibility** is more straightforward: can we actually set up the environment and trigger this vulnerability on our target environment (a single Linux server)?

A common misconception is using CVSS scores or vulnerability sources as primary decision criteria. A low CVSS score may simply indicate authentication or user interaction requirements, but such vulnerabilities are equally important in real attack chains. A vulnerability from a student project website doesn't mean it lacks value—the key is whether the vulnerability pattern itself duplicates existing cases.

## When Reproduction is REQUIRED

Several CVE categories have high reproduction priority and should be handled first when encountered.

### 1. Actively Exploited Vulnerabilities
When CISA or other authoritative organizations mark a vulnerability as "active exploitation," attackers are already using this vulnerability in the wild, and its practical value is unquestionable. These vulnerabilities typically have high CVSS scores and "Automatable: yes" markings, indicating attacks can be automated at scale.

### 2. Protocol-Level Vulnerabilities
These vulnerabilities affect protocol implementations themselves, not specific products. For example, HTTP/2 Stream Reset vulnerabilities affect all servers implementing the protocol, including Varnish, H2O, and various web servers. Protocol-level vulnerabilities have broad impact and often contain technically valuable details.

### 3. Supply Chain Attack Vulnerabilities
This includes npm/PyPI package poisoning, S3 bucket takeover, dependency confusion, and similar attack methods. Supply chain attacks represent one of the most threatening attack types in recent years, and related vulnerability cases are crucial for understanding these attack patterns. A typical example: a WordPress plugin loading JavaScript from an abandoned S3 bucket—an attacker taking over that bucket can inject arbitrary code into all websites using the plugin.

### 4. Security Product Vulnerabilities
WAF bypasses, antivirus vulnerabilities, security plugin XSS, etc. These vulnerabilities carry special ironic significance and practical value. Security products often have elevated system privileges, making breaches highly damaging. "Security products being insecure" is itself an important security topic.

### 5. AI and Machine Learning Tool Vulnerabilities
With the proliferation of AI tools, attacks targeting AI Agents, LLM applications, and code assistants are increasing. These vulnerabilities represent new frontiers in security—for example, symlink attacks against AI coding assistants enabling arbitrary file overwrites, or prompt injection manipulating AI Agents to execute malicious operations.

## When Reproduction is RECOMMENDED

Beyond required high-priority vulnerabilities, another large category warrants resource investment despite lower urgency.

### Well-Known Product Vulnerabilities
"Well-known" can be measured by GitHub stars, download counts, or industry adoption breadth. Well-known product vulnerabilities attract more attention, and related technical analysis has greater reference value. For example, widely-used tools like ImageMagick, Vim, or GStreamer—even with moderate CVSS scores, their case value remains significant.

### CWE Type Uniqueness
If the vulnerability library lacks cases for a certain CWE type, even vulnerabilities from lesser-known products are worth collecting. Priority CWE types include:
- Command Injection (CWE-78)
- Code Injection (CWE-94)
- Deserialization (CWE-502)
- Use-After-Free (CWE-416)
- Stack Overflow (CWE-121)
- Heap Overflow (CWE-122)
- Prototype Pollution (CWE-1321)
- SSRF (CWE-918)
- Path Traversal (CWE-22)
- File Upload (CWE-434)
- Missing Authentication (CWE-306)
- Missing Authorization (CWE-862)
- Log Injection (CWE-117)
- ReDoS (CWE-1333)

### Complete Security Research Reports
When security teams like JFrog, Snyk, WPScan, Talos, or GitHub Security Lab publish detailed vulnerability analyses, the vulnerability has received professional scrutiny with typically more complete and reliable technical details. Reproduction is often smoother due to detailed PoCs and trigger condition descriptions.

### Clear Patches
Vulnerabilities with corresponding fix commits are easier to reproduce and learn from. Comparing pre- and post-fix code clearly reveals root causes and remediation approaches, offering high educational value for vulnerability research and secure development.

## When Reproduction is OPTIONAL

Some vulnerabilities aren't high priority but may still be considered when resources permit.

### General Vulnerabilities from Different Products
General vulnerabilities (XSS, CSRF, SQLi) from products with reasonable recognition, complete PoCs, and source code can serve as supplementary cases. The key is ensuring these cases come from different codebases, demonstrating how similar vulnerabilities manifest in different contexts.

### Low CVSS but Unique Trigger Methods
Low CVSS typically indicates higher privilege or user interaction requirements, but this doesn't mean the vulnerability lacks technical value. For example, a stored XSS requiring admin privileges might be exploited through social engineering in actual attacks—its trigger mechanism and payload construction remain research-worthy.

### Representative Cases for Gap-Filling
Even if a vulnerability isn't particularly "impressive," it's worth keeping when serving as a representative case for a missing type. If the library lacks a specific type, an ordinary but complete case is better than nothing.

## When to SKIP

Not all CVEs are worth reproducing. Some situations should be explicitly skipped to conserve resources.

### Multiple Similar Vulnerabilities in the Same Project
The most common skip scenario. When a project has multiple reported SQLi or XSS vulnerabilities, they often represent the same code pattern appearing at different endpoints. For example, a PHP project's login.php, admin.php, and profile.php all have `$_GET['id']` directly concatenated into SQL—essentially the same vulnerability pattern. Keep one most representative case and skip the rest.

### Identical Vulnerability Patterns
Should be merged. "Identical" criteria: same code structure, similar trigger parameters, identical attack payload format. For example, multiple projects with reflected XSS from unescaped `$_SERVER['REQUEST_URI']`—the same programming error in different instances. Keep one representative.

### Bulk Similar Vulnerabilities from the Same Source
Handle carefully. Sites like code-projects, SourceCodester, and PHPGurukul host numerous student practice projects with generally poor code quality and massive duplicate vulnerabilities. This doesn't mean skip all—the correct approach is selecting representative cases from each source, ensuring different CWE types and trigger methods are covered, then skipping duplicates.

### True Environmental Limitations
Another skip category, but accurate judgment of what constitutes "environmental limitations" is needed.

## How to Assess Environmental Limitations

Environmental limitations refer to situations genuinely impossible to reproduce on the target environment (single Linux server), not "reproduction is somewhat troublesome."

### Actual Environmental Limitations:

**Windows-Only Software**: Cannot be reproduced on Linux. Examples: desktop applications dependent on Windows APIs, .NET Framework (not .NET Core) applications, Windows-specific system services.

**Hardware Device Vulnerabilities**: Usually require actual hardware for reproduction. Embedded device firmware, routers, cameras, dashcams, etc.—unless QEMU emulation is available, cannot be reproduced on regular servers.

**SaaS Cloud Service Vulnerabilities**: If not locally deployable, this is an environmental limitation. For example, a cloud service API vulnerability with no open-source or self-hosted version cannot be reproduced locally.

**Desktop GUI Requirements**: Difficult to reproduce on servers. Typical example: browser sandbox escape vulnerabilities requiring actual browser execution with graphical interaction.

**Commercial Closed-Source Software**: Requires careful judgment. Characteristics: requires paid licenses for installation packages, no public source code, no official free trials or Docker images. Examples include Cisco ASA/FTD firewalls, WatchGuard Fireware, Fortinet FortiGate, VMware vCenter, and similar enterprise products. Even with high CVSS scores or active exploitation markings, without access to the software itself, environment setup is impossible.

Note: Some commercial products have open-source or community versions, but this doesn't guarantee vulnerability reproducibility. The key is whether the vulnerable functionality exists in the free version. For example, if a vulnerability affects a commercial-only advanced authentication module or enterprise management feature absent from the community version, deploying the community version won't trigger the vulnerability. When encountering commercial product vulnerabilities with free versions, carefully read vulnerability descriptions and impact scope to confirm the vulnerable code path exists in the obtainable version.

**Core question for commercial closed-source software**: Can we obtain a software version containing this vulnerability and deploy it locally? If no, mark as Skip regardless of the vulnerability's importance.

### NOT Environmental Limitations:

**Local Attack Vectors (CVSS AV:L)**: Local attacks mean triggering requires local access, such as processing a malicious file. Software can be installed on servers and vulnerabilities triggered with malicious files—completely different from requiring specific hardware. Many file-processing vulnerabilities (image libraries, PDF parsers, media players) have local attack vectors but can equally be triggered in server environments—user uploads malicious file, server processes it and triggers the vulnerability.

**Additional Software Installation Required**: Whether through apt, pip, npm, or source compilation, if the software runs on Linux, reproduction is possible.

**Complex Deployment Processes**: Some vulnerability environment setups are indeed tedious, requiring multiple component configurations, but "difficult" and "impossible" are different. If technically feasible, don't skip due to complexity.

## Common Misconceptions

Several common judgment errors should be avoided in practice.

### Misconception 1: Using CVSS Scores as Primary Criteria
CVSS is a useful reference but designed to assess vulnerability severity, not research value. A CVSS 3.5 vulnerability might score low due to admin privilege requirements, but its attack technique could be highly unique. Conversely, a CVSS 7.3 SQLi might be yet another `$_GET` parameter concatenation cliché.

### Misconception 2: Judging Value by Source
Automatically marking code-projects or PHPGurukul as "student project garbage" is incorrect. The correct approach: check specific CWE types and vulnerability patterns—a CWE-862 missing authentication vulnerability, even from a student project, represents a different case type from common SQLi and is worth keeping.

### Misconception 3: Equating Local Attack Vectors with Low Value
Local privilege escalation is crucial in attack chains; file-processing vulnerabilities are equally dangerous in server scenarios. Attack vectors describe trigger methods only and shouldn't determine research value.

### Misconception 4: Prioritizing Quantity Over Coverage
A good vulnerability library should cover diverse vulnerability types, not accumulate massive duplicate cases of a few types. 100 different patterns are more valuable than 1,000 homogeneous vulnerabilities.

## Classification System

Based on these principles, CVEs are classified into the following tiers:

### Tier 1 (REQUIRED)
- Actively exploited vulnerabilities
- Protocol-level vulnerabilities
- Supply chain attacks
- Security product vulnerabilities
- AI tool vulnerabilities
- CVSS 9.0+ with automation capability

These have the highest practical and research value—prioritize resources here.

### Tier 2 (RECOMMENDED)
- Well-known product vulnerabilities
- Unique CWE type cases
- Complete security research report support
- Clear patches for learning

Lower priority than Tier 1 but still essential library components.

### Tier 3 (OPTIONAL)
- Supplementary general vulnerability cases
- Low CVSS with unique trigger methods
- Representative cases filling library gaps

Consider when resources are sufficient.

### Skip
- Duplicate vulnerabilities from the same project
- Identical pattern duplicates
- True environmental limitations preventing reproduction
- Incomplete information preventing reproduction

## Practical Recommendations

When analyzing CVEs, consider in this order:

1. **First, determine if true environmental limitations exist.** If the vulnerability depends on Windows, specific hardware, or non-locally-deployable cloud services, mark as Skip with reasoning.

2. **Second, check for duplication with existing cases.** If it's another endpoint in the same project, or completely matches an already-collected vulnerability pattern, consider Skip or replacement (if the new case is more representative).

3. **Third, assess the vulnerability's unique value.** Check CWE type, product recognition, security team analysis reports, special attack scenarios (supply chain, protocol-level, security products, etc.).

4. **Finally, determine classification based on analysis.** When uncertain about classification, prefer overestimating value rather than easily skipping—missing a unique case costs more than collecting an ordinary one.

Following these standards enables building a comprehensive, unique, and highly practical CVE vulnerability reproduction library within limited resources.
