DLU: Dictionary Look-Up Data and Prediction

David Strohmaier; Gladys Tyen; Hongyi gu; Diane Nicholls; Zheng Yuan; Paula Buttery

DLU: Dictionary Look-Up Data and Prediction

David Strohmaier, Gladys Tyen, Hongyi gu, Diane Nicholls, Zheng Yuan, Paula Buttery

Published: 24 May 2025, Last Modified: 17 Jun 2025CoNLL 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: word complexity, personalised learning, dictionary look-up

TL;DR: In this paper, We advocate for the novel task of "dictionary look-up prediction" as a means for evaluating the complexity of words and present results from a variety of models.

Abstract: Knowing which words language learners struggle with is crucial for developing personalised education technologies. In this paper, we advocate for the novel task of "dictionary look-up prediction" as a means for evaluating the complexity of words in reading tasks. We release the Dictionary Look-Up development dataset (DLU-dev) and the Dialogue Dictionary Look-Up dataset (D-DLU), which is based on chatbot dialogues. We demonstrate that dictionary look-up is a challenging task for LLMs (results are presented for LLaMA, Gemma, and Longformer models). We explore finetuning with the ROC* loss function as a more appropriate loss for this task than the commonly used Binary Cross Entropy (BCE). We show that a feature-based model outperforms the LLMs. Finally, we investigate the transfer between DLU and the related tasks of Complex Word Identification (CWI) and Semantic Error Prediction (SEP), establishing new state-of-the-art results for SEP.

Copyright Agreement: pdf

Submission Number: 193

Loading