G2P on the Edge: Bridging Morphological Accuracy and Hardware Constraints via Quantized Hard-Monotonic Attention
Keywords: Grapheme-to-Phoneme (G2P), Edge Computing, TinyML, Monotonic Attention, Product Quantization, Neuro-Symbolic AI, Computational Phonology, Morphologically Rich Languages.
Abstract: State-of-the-Art (SOTA) Grapheme-to-Phoneme (G2P) models rely on over-parameterized Transformers, rendering them computationally prohibitive for microcontrollers with extreme memory constraints ($<32$KB). We introduce the \textbf{Neuro-Symbolic Quantized Hard-Monotonic Transducer (NS-QHMT)}, a linear-complexity architecture designed for the extreme edge. By replacing soft attention with a latent hard-monotonic pointer and employing integer-only quantization, NS-QHMT reduces the memory footprint to just \textbf{16.0 KB}—an $16\times$ reduction compared to standard Transformers. While enforcing strict hardware constraints, our model achieves a Phoneme Error Rate (PER) of \textbf{29.6\%} on English benchmarks. Crucially, empirical profiling on a simulated Cortex-M7 confirms that NS-QHMT is the \textit{only} neural architecture among baselines capable of executing within a 32KB RAM envelope without Out-Of-Memory (OOM) failure, bridging the gap between neural performance and legacy hardware constraints.
Paper Type: Long
Research Area: Phonology, Morphology and Word Segmentation
Research Area Keywords: Phonology, Morphology, Grapheme-to-phoneme conversion, Neuro-symbolic methods, Model compression, Quantization, Efficient neural architectures, Low-resource learning, Morphologically-rich languages
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English, Hungarian, Turkish, Inuit-Yupik, Adyghe
Submission Number: 7907
Loading