LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

ACL ARR 2025 July Submission468 Authors

28 Jul 2025 (modified: 20 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient language models for semantic-intensive applications. The code, data, and models will be made publicly available upon completion of the review process.

Paper Type: Long

Research Area: Semantics: Lexical and Sentence-Level

Research Area Keywords: paraphrasing,definition modeling,polysemy,sparse models

Contribution Types: NLP engineering experiment

Languages Studied: English

Previous URL: https://openreview.net/forum?id=QJWrIrDrUe

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Justification For Not Keeping Action Editor Or Reviewers: Dismissing the work without any concrete comments regarding correctness of the results or argumentation.

Software: zip

Data: zip

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 4

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: Section 4

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Section 4

B4 Data Contains Personally Identifying Info Or Offensive Content: Yes

B4 Elaboration: Section 4

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Section 4

B6 Statistics For Data: Yes

B6 Elaboration: Section 4

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 4

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 4

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 4

C4 Parameters For Packages: Yes

C4 Elaboration: Section 4

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: Yes

D1 Elaboration: Section 4

D2 Recruitment And Payment: Yes

D2 Elaboration: Section 4

D3 Data Consent: Yes

D3 Elaboration: Section 4

D4 Ethics Review Board Approval: Yes

D4 Elaboration: Section 4

D5 Characteristics Of Annotators: Yes

D5 Elaboration: Section 4

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 468

Loading