Learning to Reason About Code Insecurity: Composite-Reinforcement Fine-Tuning for Cognitive Alignment
Keywords: Cognitive Alignment, Reinforcement Learning
Abstract: Automated vulnerability analysis increasingly relies on language models, yet even strong LLMs exhibit unstable security reasoning: they either over-flag benign code or miss critical flaws, particularly under cross-language shifts. We present ARGO--Composite-Reinforcement Fine-Tuning for Cognitive Alignment--a label-efficient training framework that explicitly optimizes a composite reward combining (i) label-based decision scoring via a strictly proper scoring rule on predicted probabilities, (ii) explanation grounding and consistency through structure- and code-referencing heuristics that do not use CWE labels or definitions, and (iii) output-format coherence through a strict schema validator. This moves the objective from bare classification toward deliberative, auditable analysis while explicitly acknowledging and isolating the supervised component in the reward. We cast each example as a short two-phase episode: first, the policy produces an explanation; then it deterministically emits a calibrated probability through a regression head. The binary decision is deterministically derived from the probability at inference (thresholding) rather than being sampled as a separate action. Policy updates are stabilized via batch-level affinity-weighted neighborhood smoothing over deterministic encoding and a KL trust term to a reference policy. Across BIGVUL, DIVERSEVUL, and CLEANVUL, ARGO consistently improves macro-F1 over strong baselines (e.g., up to 0.71 in-distribution; substantial gains under cross-language transfer). Compared to standard supervised fine-tuning, ARGO reduces catastrophic bias toward predicting the vulnerable class and improves recognition of benign code without relying on CWE supervision.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 9891
Loading