Breaking the Data Barrier: LexiCore - A Lexicon-First Hybrid System for Extremely Low-Resource Translation
Keywords: machine translation, low-resource languages, hybrid systems, neural13 symbolic integration, constructed languages
TL;DR: A lexicon-first hybrid translation system that avoids massive data requirements by combining dictionary lookup, grammar rules, and LLM grammar correction (not translation).
Abstract: We present LexiCore, a lexicon-first hybrid translation system that achieves 18.92 BLEU on Dothraki–English translation using only 2,254 parallel examples—a 4,438-fold data deficit compared to neural requirements. After systematic exploration of 14 failed approaches yielding 0.00 BLEU, our breakthrough combines dictionary lookup, grammar rules, and constrained LLM polishing to achieve genuine translation without memorization. LexiCore demonstrates a 3.5% exact match rate (7/200) and 28% high-quality translations (BLEU > 30), requiring no GPU training and minimal API costs ($0.10 per 200 translations). The key insight: when data is scarce but linguistic documentation exists, explicit knowledge can substitute for statistical learning, providing the first scalable solution for extremely low-resource constructed languages.
Submission Number: 189
Loading