Abstract: Larger models often outperform smaller ones but come with high computational costs.
Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models.
However, designing effective deferral rules remains a challenge.
In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimation (QE) metrics as deferral rules. We show that QE-based deferral allows a cascaded system to match the performance of a larger model while invoking it for a small fraction ($30\%$ to $50\%$) of the examples, significantly reducing computational costs. We validate this approach through both automatic and human evaluation.
Paper Type: Short
Research Area: Machine Translation
Research Area Keywords: Machine translation, efficiency, quality estimation, cascaded systems
Languages Studied: English, Czech, German, Spanish, Hindi, Icelandic, Japanese, Russian, Ukrainian
Submission Number: 1361
Loading