Keywords: Code Generation, Agentic AI, Code Models, Epistemic Uncertainty, Test-Time Adaptation, Active Probing
Abstract: The deployment of Large Language Models (LLMs) in enterprise environments is critically impeded by the Private Library Problem: models pre-trained on open-source corpora suffer catastrophic performance degradation when facing internal, undocumented APIs. We argue that existing paradigms—whether retrieval-augmented or search-based—fail because they treat this deficit as aleatoric noise to be overcome by sampling, rather than epistemic uncertainty requiring active inquiry. To bridge this gap, we introduce EVoC (Epistemic Value of Computation), a framework that shifts the paradigm of code generation from blind search to active probing. EVoC enables agents to autonomously calibrate their internal beliefs against opaque environments via two novel mechanisms: (1) a Net Value of Computation (NVoC) decision-theoretic criterion that authorizes execution only when expected information gain outweighs computational cost; and (2) an Adaptive Verifier, a lightweight LoRA module updated online to "sediment" execution feedback into parametric memory, thereby capturing latent runtime constraints (e.g., hidden state machines) that escape in-context learning. We introduce Private-SWE-Bench, a stratified benchmark simulating obfuscated environments, where EVoC achieves 98.2% Pass@1—outperforming state-of-the-art tool-using agents (OpenHands) by 12.4 points and search baselines (S*) by 24.4 points. Crucially, on tasks with latent constraints, EVoC nearly doubles the success rate of search methods (96.7% vs. 51.3%), demonstrating that when the map is missing, the ability to probe is more decisive than the ability to plan.
Paper Type: Long
Research Area: Code Models
Research Area Keywords: AI / LLM Agents, Code Models
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 6082
Loading