The Green KNIGHT: Green Machine Translation with Knowledge-Distilled, Narrow, Inexpensive, Greedy, Hybrid Transformers
Abstract: State-of-the-art neural machine translation (NMT) models deliver high-quality translations at the expense of large inference latency and energy consumption, requiring vast GPU fleets and contributing significantly to carbon emissions. To democratize and ``green'' NMT, we introduce the Green KNIGHT, a hardware-agnostic collection of recipes to optimize model performance in terms of speed and energy consumption, with only a minor trade-off in quality.
On two high-resource benchmarks we show up to 91$\times$ CPU speedup and 94\% energy savings for En$\to$De, and 65$\times$ speedup and 10\% energy usage for En$\to$Ko; while incurring only minor losses of 9\% relative BLEU.
Our results prove that efficient and environmentally conscious NMT can be realized through optimizations build on well-understood, off-the-shelf techniques with no custom low-level code required, making our approach immediately deployable in real-world translation pipelines.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: efficient inference for MT, MT deployment and maintenance, scaling, modeling
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English, German, Korean
Submission Number: 6665
Loading