Improving Cascade Routing for Structured Attribute Generation with Heterogeneous Confidence

Fatemeh Mansoori; Andrea Scarinci; Aditya Aggarwal; Suleiman A. Khan; Ashwin Chandramouli

Improving Cascade Routing for Structured Attribute Generation with Heterogeneous Confidence

Fatemeh Mansoori, Andrea Scarinci, Aditya Aggarwal, Suleiman A. Khan, Ashwin Chandramouli

Published: 01 Jun 2026, Last Modified: 11 Jun 2026AdaptFM PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: adaptive inference, model cascades, structured generation, abstention, confidence estimation, selective deferral, large language models

TL;DR: In structured generation, log-probability confidence behaves differently across output regimes and attribute families; cascade routing should account for this heterogeneity rather than rely on a single pooled threshold.

Abstract: Multi-model inference systems—whether based on routing, cascading, or unified strategies—often rely on confidence signals to decide when a small language model (SLM) output should be accepted or deferred. While such signals are commonly used in classification and short-form generation, their reliability in structured generation settings remains poorly understood. In this work, we study log-probability confidence in structured attribute value generation, where a model must produce either a schema-compliant VALUE or an ABSTAIN outcome. We show that confidence is prediction-type-conditioned: in our setting, average token log-probability is a stronger error-detection signal for VALUE outputs than for ABSTAIN outputs. As a result, global confi- dence thresholding yields imbalanced trade-offs, improving VALUE precision at the cost of recall while providing weaker control over abstention behavior. We therefore cast cascade routing as type-aware selective deferral, in which acceptance decisions depend on both the confidence score and the pre- dicted output type, with VALUE thresholds spe- cialized by attribute family. Experiments on a large-scale product attribute generation task show that a fine-tuned SLM combined with selective deferral improves quality–cost trade-offs relative to pooled thresholding. The strongest operating point routes low-confidence VALUE predictions while keeping ABSTAIN predictions from the first-stage model, highlighting the importance of modeling heterogeneous reliability in structured- generation cascades.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 105

Loading