SrELEXIS-WSD: Hybrid Semi-Automated WSD for Serbian with Large Language Models—Results and Challenges

Published: 27 May 2026, Last Modified: 27 May 2026UniDive 2026EveryoneRevisionsCC BY-SA 4.0
Keywords: Word Sense Disambiguation, Serbian, Sense inventory, Semantic annotation, Semi-automated annotation
Working Group: WG2: Lexicon-corpus interface
WG1 Tasks: Task 1.6: Identification and Annotation of MWES in corpus languages
Abstract: The Serbian extension of the ELEXIS-WSD corpus was developed through translation, annotation, and validation, resulting in a high-quality, multi-layered dataset enriched with MWE, NER/NEL, and sense annotations. A semi-automated WSD methodology was applied, combining a curated Serbian WordNet–based sense inventory with LLM-based zero-shot disambiguation and manual validation in INCEpTION. The sense inventory was significantly expanded, converted into an RDF knowledge base, and integrated with corpus data to support transparent, explainable, and iterative annotation. Evaluation shows that LLMs outperform embedding-based baselines, with performance improving as the sense inventory is refined; GPT-4.1 achieved the best results (accuracy 0.825). Overall, the approach demonstrates that LLM-assisted WSD is effective for Serbian, but still requires expert validation, and the workflow supports continuous improvement of both annotations and lexical resources.
WG2 Tasks: Task 2.2: Design of a lexicon-corpus interface
Tracks For Type Of Contribution: Work in progress
Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: No
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 25
Loading