SrELEXIS-WSD: Hybrid Semi-Automated WSD for Serbian with Large Language Models—Results and Challenges

Ranka Stanković; Cvetana Krstev; Saša Petalinkar; Milica Ikonić Nešić; Aleksandra Markovic; Marina Bagi; Marijana Đukić; Jelena Bogdanović

SrELEXIS-WSD: Hybrid Semi-Automated WSD for Serbian with Large Language Models—Results and Challenges

Ranka Stanković, Cvetana Krstev, Saša Petalinkar, Milica Ikonić Nešić, Aleksandra Markovic, Marina Bagi, Marijana Đukić, Jelena Bogdanović

Published: 27 May 2026, Last Modified: 27 May 2026UniDive 2026EveryoneRevisionsCC BY-SA 4.0

Keywords: Word Sense Disambiguation, Serbian, Sense inventory, Semantic annotation, Semi-automated annotation

Working Group: WG2: Lexicon-corpus interface

WG1 Tasks: Task 1.6: Identification and Annotation of MWES in corpus languages

Abstract: The Serbian extension of the ELEXIS-WSD corpus was developed through translation, annotation, and validation, resulting in a high-quality, multi-layered dataset enriched with MWE, NER/NEL, and sense annotations. A semi-automated WSD methodology was applied, combining a curated Serbian WordNet–based sense inventory with LLM-based zero-shot disambiguation and manual validation in INCEpTION. The sense inventory was significantly expanded, converted into an RDF knowledge base, and integrated with corpus data to support transparent, explainable, and iterative annotation. Evaluation shows that LLMs outperform embedding-based baselines, with performance improving as the sense inventory is refined; GPT-4.1 achieved the best results (accuracy 0.825). Overall, the approach demonstrates that LLM-assisted WSD is effective for Serbian, but still requires expert validation, and the workflow supports continuous improvement of both annotations and lexical resources.

WG2 Tasks: Task 2.2: Design of a lexicon-corpus interface

Tracks For Type Of Contribution: Work in progress

Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: No

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 25

Loading