Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length

Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length

ACL ARR 2026 January Submission2710 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, context length, batch processing, multiple instance processing, data science

Abstract: Users often rely on Large Language Models (LLMs) for processing documents or performing analysis over a number of instances, such as sentences. For instance, analysing the average sentiment of movie reviews requires an LLM to process the sentiment of each review individually in order to provide a final aggregated answer. While LLM performance on such individual tasks is beyond doubt, there has been little research on how LLMs perform when dealing with multi-instance inputs. In this paper, we perform an exhaustive evaluation of the ability of LLMs to handle multi-instance inputs for tasks in which they excel individually. The results show that most LLMs follow a pattern of slight performance degradation for small numbers of instances ($\approx$20–100), followed by a performance collapse beyond larger instance counts. Crucially, our analysis shows that while context length is partially responsible for this degradation, the number of instances has a stronger effect on the final results. This finding suggests that when optimising LLM performance for multi-instance processing, attention should be paid to both context length and, in particular, instance count.

Paper Type: Long

Research Area: Language Models

Research Area Keywords: prompting, scaling, robustness

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 2710

Loading