Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

ACL ARR 2026 January Submission3440 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Collaboration, Efficient Reasoning, Small Language Models, Thinking Insights, Cost-aware Inference

Abstract: Recent advancements in large language models (LLMs) have popularized reasoning-intensive paradigms, which improve response quality but significantly increase computational costs due to extended reasoning chains. We propose Tandem, a collaborative framework between large and small language models designed to achieve high-quality reasoning with low computational overhead. In Tandem, the LLM acts as a mentor by generating four types of critical reasoning insights (Goal, Planning, Retrieval, Action), while a small language model (SLM) executes the reasoning to produce the final answer. A cost-aware judgment mechanism uses perplexity and entropy to adaptively determine when sufficient insights have been accumulated, allowing early termination of LLM generation. Experiments on MATH and GSM8K show that Tandem reduces computational costs by approximately 50% while outperforming the LLM alone in accuracy.https://anonymous.4open.science/r/Ensemble-Hub-0FD8

Paper Type: Long

Research Area: Low-resource Methods for NLP

Research Area Keywords: LLM Efficiency, NLP in resource-constrained settings, agent coordination and negotiation

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 3440

Loading