A Proposal for a Lexicon-Corpus Interface: The Interplay Between a Concordancer and a Collaborative Knowledge Base Management Tool

Published: 27 May 2026, Last Modified: 27 May 2026UniDive 2026EveryoneRevisionsCC BY-SA 4.0
Keywords: lexicon-corpus interface, parallel corpus, knowledge base management tool, concordancer
Working Group: WG2: Lexicon-corpus interface
Abstract: Within the UniDive COST Action, one of the tasks within WG2 is the design of a lexicon-corpus interface, i.e. a solution that enables interlinking lexicon entries with their occurrences in corpora. Digital lexicons are complementary to corpora because they aim at holistic language modeling and potentially describe a very wide range of linguistic objects, whereas in corpora many phenomena occur rarely or never. In this paper, we present one of the outcomes of T2.2 within WG2: a proposal for a possible infrastructure that links a corpus and a lexicon, but also promotes a paradigm shift from lexeme-focused to sense-focused links to corpora. The infrastructure consists of a sense-annotated corpus (ELEXIS-WSD Parallel Sense-Annotated Corpus), a knowledge base management tool (Wikibase Cloud), and a concordancer (NoSketch Engine).
WG2 Tasks: Task 2.2: Design of a lexicon-corpus interface
Tracks For Type Of Contribution: Complete work (including previously published work)
Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: No
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 38
Loading