Less is More: Compressed Reasoning with Large Language Models via Structured Prompting

ACL ARR 2025 May Submission6119 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent breakthroughs in LLMs have significantly enhanced their abilities to reason and solve thinking problems in various domains. Reinforcement learning (RL) and Supervised Fine Tuning (SFT)-based post-training mechanisms along with high-quality curated data have enabled models such as DeepSeek-R1, Qwen2.5-Math, OpenAI o1 etc to outperform the state-of-the-art even in challenging benchmarks such as AIME'24 and MATH-500. However, a significant drawback of these models is the large Chain-of-Thought (CoT) generation step required to get to the final response, increasing resource requirement and response time. RL-based approaches with rewards for brevity as well as accuracy reduce verbosity, but require custom training with multiple generations per problem, involving significant resource usage, often limiting practitioners to small LLMs. Additionally, the performance lift obtained can be inconsistent. In this paper, we introduce TeleMathLang, a minimal syntax for reasoning and math that enables LLMs to generate complete chains of reasoning while reducing response length by 30-65% across GSM8K, AI2-ARC, and MATH-500. We show that LLMs condense their responses when TeleMathLang is used purely as a prompting strategy as well as for finetuning (even small LLMs with 1.5B parameters). Further, we show that it outperforms other concise reasoning prompts in accuracy as well as semantic entropy, preserving what makes CoT work while reducing verbosity.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: Language Modeling, Question Answering, Efficient/Low-Resource Methods for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 6119
Loading