Small Talk, Big Impact: The Energy Cost of Thanking AI

ICLR 2026 Conference Submission12776 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM inference efficiency, Sustainable AI, Latency modeling, GPU energy profiling, Model scaling
TL;DR: Saying “thank you” to an LLM has a measurable energy cost. We quantify it across models and show how prompt length, output verbosity, and model size impact inference energy
Abstract: Being nice doesn't cost you anything - or does it? In this paper, we quantify the energy cost of seemingly innocuous messages such as ``thank you'' when interacting with large language models to convey politeness. Using real-world conversation traces and fine-grained energy measurements, we quantify how input length, output length and model size affect energy use. While politeness is our motivating example, it also serves as a controlled and reproducible proxy for measuring the energy footprint of a typical LLM interaction. Our findings provide actionable insights for building more sustainable and efficient LLM applications, especially in increasingly widespread real-world contexts like chat. As user adoption grows and billions of prompts are processed daily, understanding and mitigating this cost becomes crucial - not just for efficiency, but for sustainable AI deployment.
Supplementary Material: zip
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 12776
Loading