Training-Free Semantic Deferrals for Open-Ended LLM Cascades

Duncan Soiffer; Steven Kolawole; Virginia Smith

Training-Free Semantic Deferrals for Open-Ended LLM Cascades

Duncan Soiffer, Steven Kolawole, Virginia Smith

Published: 11 Jun 2025, Last Modified: 10 Jul 2025ES-FoMo IIIEveryoneRevisionsBibTeXCC BY 4.0

Keywords: adaptive inference, efficient inference, model cascading, cascading for generative tasks

TL;DR: Semantic signals are reliable training-free deferral metrics for generative LLM cascades.

Abstract: Existing cascade systems struggle with open-ended text generation due to evaluation challenges where multiple valid outputs exist without ground truth references. We propose using semantic agreement between multiple model outputs as a training-free deferral signal and evaluate semantic similarity metrics against token-level confidence across translation, summarization, question answering, and reading comprehension tasks. We show that semantic signals provide a stronger indication of when deferral is appropriate than token-level methods and are resilient to heterogeneous model quality.

Submission Number: 124

Loading