LM-Lexicon: Definition Modeling with Mixture-of-Experts

LM-Lexicon: Definition Modeling with Mixture-of-Experts

ACL ARR 2025 February Submission5678 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We introduce LM-Lexicon, a definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7\% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10\% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1\%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient and targeted language models for semantic-intensive applications.

Paper Type: Long

Research Area: Semantics: Lexical and Sentence-Level

Research Area Keywords: paraphrasing,polysemy,sparse models

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 5678

Loading