Investigating Zero-shot Topic Labelling of Scientific Papers Using LLMs

Jens Bruchertseifer, Patrick Neises, Maria Hinzmann, Ralf Schenkel, Christof Schöch

Published: 2025, Last Modified: 19 Dec 2025BTW Workshops 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we focus on the problem of adding content labels of a given vocabulary to scientific publications using LLMs. After a short overview of the current state of the work, we present a first implementation of a zero-shot classification pipeline. This implementation is already realized with a focus on extendibility and customizability, so that it can easily be used for different data sets and use cases in the future. We select a subset of the DBLP Discovery Dataset and execute our pipeline on it. In the end, we discuss the results, suggest a comparison with a second data set, the STTCL journal from the humanities, and present its challenges. Both of the mentioned data sets comply with the FAIR data principles. Finally, we consider our plans for the next steps.

External IDs:dblp:conf/btw/BruchertseiferN25