Abstract: We investigate a failure mode of large language models (LLMs) in which benign, plain-text prompts elicit excessive outputs, a phenomenon we term Overflow. Unlike jailbreaks or prompt injection, Overflow arises under ordinary interaction settings and carries concrete risks for denial-of-wallet, latency, and cross-user performance degradation. We introduce BenchOverflow, a model-agnostic benchmark of nine plain-text prompting strategies that amplify output volume without adversarial suffixes or policy circumvention. Using a standardized protocol with a fixed budget of 5,000 new tokens, we evaluate BenchOverflow on nine open- and closed-source models. Across models, BenchOverflow produces pronounced rightward shifts and heavy tails in length distributions. Cap-saturation rates (CSR@1k/3k/5k) and empirical cumulative distribution functions (ECDFs) quantify tail risk; within-prompt variance and cross-model correlations show that Overflow is broadly reproducible yet heterogeneous across families and attack vectors. A lightweight mitigation—a fixed conciseness reminder—attenuates right tails and lowers CSR for several strategies. Our findings reframe verbosity as a measurable risk to reliability and cost, rather than a mere stylistic quirk. BenchOverflow provides a practical, reproducible protocol for benchmarking length-control robustness in deployed LLMs.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: (1) extended the defensive‐mechanism evaluation to all models, confirming consistent effectiveness; (2) moderated the discussion of over‐generation to emphasize practical risks; (3) added analysis and visualizations of refusal–over‐generation behavior in Section 4.2; (4) included a new workflow diagram (Figure 1); (5) introduced a task‐adequacy evaluation with an LLM‐as‐judge, along with a new results table, to assess the utility/cost tradeoff of the conciseness reminder; (6) expanded description of curation of BenchOverflow (updated Figure 1, specified model and temperature, added full meta-prompt template (in Appendix A.3)); (7) added more examples for each attack vector (in Appendix A.2); (8) updated the limitations section (Section 6).
Assigned Action Editor: ~Lingpeng_Kong1
Submission Number: 5892
Loading