Keywords: large language models, test-time compute, coverage
Abstract: Foundation-model reasoning is often scaled with fixed trace, branch, or token budgets, but fixed budgets waste inference compute when many rollouts collapse to the same answer class, proof route, evidence pattern, or fix strategy. We formulate test-time reasoning as resource-adaptive portfolio construction: under a compute budget, select unfinished rollout actions that are likely to add residual task-relevant support not already present in the current portfolio. We introduce semantic coverage, a portfolio objective over evaluator-induced support targets, and show that fixed semantic coverage is normalized, monotone, and submodular; under batch separability, the expected gain from bounded completion-level rollout actions inherits the same structure. A residual threshold identity shows that additive trace scoring overcounts support exactly by the expected excess multiplicity of duplicate residual hits, motivating Semantic-Coverage Portfolio Search (SCPS), a receding-horizon controller that scores actions by predicted residual support, predicted overlap, and cost. On a 256-question held-out MMLU-Pro set with Qwen3.5-9B, an exact-answer SCPS variant reaches 91.0% portfolio Pass@16, compared with 75.0% for tree-prefix and 74.2% for semantic-pruning baselines, while using 31.6% of their realized tokens. This is a portfolio-acquisition claim, not a terminal-selector claim: SCPS directs inference compute toward support the portfolio lacks.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 151
Loading