Keywords: Inference-Time Scaling, Large Language Models, Concurrent reasoning, Edge inference
TL;DR: We propose Group Think: a single LLM simulates multiple concurrent reasoning agents that collaborate at the token level by dynamically adapting to each other’s progress, leading to improved performance.
Abstract: Large language models (LLMs) increasingly rely on extended inference-time computation, where reasoning is typically realized as a single sequential trajectory. While longer reasoning improves performance, it also increases latency. We introduce the *Group Think* paradigm, a conceptual shift toward collaborative parallel reasoning in which multiple reasoning threads are generated concurrently and adapt dynamically to each other at the token level. We show that even existing LLMs exhibit preliminary Group Think behaviors when run with a modified inference scheme, and that these behaviors can be significantly enhanced through finetuning on a synthetic dataset of token-wise collaborative reasoning traces. These traces capture key dynamics such as high-level planning, adaptation to peers, redundancy avoidance, speculative fast forward, error correction, and divide-and-conquer strategies. Our modeling framework further incorporates attention mask modifications and positional scheduling, paired with an inference engine implementing parallel decoding with shared key–value states. This concurrent nature also enables more efficient utilization of otherwise idle computational resources, making Group Think particularly well suited for edge inference, where small batch sizes often underutilize local GPUs. Evaluation shows that our approach yields models with improved reasoning accuracy and reduced latency compared to inference-only baselines, while exhibiting richer collaborative behaviors. For the benefit of the community, we will release the *GroupThink-4k* dataset and our training and inference frameworks.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17610
Loading