Keywords: adaptive inference, model cascades, structured generation, abstention, confidence estimation, selective deferral, large language models
TL;DR: In structured generation, log-probability confidence behaves differently across output regimes and attribute families; cascade routing should account for this heterogeneity rather than rely on a single pooled threshold.
Abstract: Multi-model inference systems—whether based
on routing, cascading, or unified strategies—often
rely on confidence signals to decide when a small
language model (SLM) output should be accepted
or deferred. While such signals are commonly
used in classification and short-form generation,
their reliability in structured generation settings
remains poorly understood.
In this work, we study log-probability confidence
in structured attribute value generation, where a
model must produce either a schema-compliant
VALUE or an ABSTAIN outcome. We show that
confidence is prediction-type-conditioned: in our
setting, average token log-probability is a stronger
error-detection signal for VALUE outputs than
for ABSTAIN outputs. As a result, global confi-
dence thresholding yields imbalanced trade-offs,
improving VALUE precision at the cost of recall
while providing weaker control over abstention
behavior.
We therefore cast cascade routing as type-aware
selective deferral, in which acceptance decisions
depend on both the confidence score and the pre-
dicted output type, with VALUE thresholds spe-
cialized by attribute family. Experiments on a
large-scale product attribute generation task show
that a fine-tuned SLM combined with selective
deferral improves quality–cost trade-offs relative
to pooled thresholding. The strongest operating
point routes low-confidence VALUE predictions
while keeping ABSTAIN predictions from the
first-stage model, highlighting the importance of
modeling heterogeneous reliability in structured-
generation cascades.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 105
Loading