Keywords: Machine Translation, On-device AI, Small Language Model
Abstract: Despite its efficiency, there has been little research on the practical aspects required for real-world deployment of on-device AI models, such as the device’s CPU utilization and thermal conditions. In this paper, through extensive experiments, we investigate two key issues that must be addressed to deploy on-device models in real-world services: (i) the selection of on-device models and the resource consumption of each model, and (ii) the capability and potential of on-device models for domain adaptation. To this end, we focus on a task of translating live-stream chat messages and manually construct LiveChatBench, a benchmark consisting of 1,000 Korean–English parallel sentence pairs. Experiments on five mobile devices demonstrate that, although serving a large and heterogeneous user base requires careful consideration of highly constrained deployment settings and model selection, the proposed approach nevertheless achieves performance comparable to commercial models such as GPT‑5.1 on the well‑targeted task. The code, trained models, and LiveChatBench will be made publicly available at our GitHub.
Paper Type: Short
Research Area: Machine Translation
Research Area Keywords: domain adaptation, MT deployment and maintenance
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Data resources
Languages Studied: English, Korean
Submission Number: 2628
Loading