Annotation and linguistic analysis of claim types for fact-checking

Oliver Deck, Z. Melce Hüsünbeyi, Leonie Uhling, Tatjana Scheffler

Published: 25 Mar 2025, Last Modified: 13 Oct 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Among the news items circulating in social media, only some contain factual statements, and factual claims can be differentiated by their check-worthiness. We describe the check-worthiness annotation of a novel corpus of claims obtained from real-world submissions to a German fact-checking organization: the German Crowd Claims (GCC) corpus. We iteratively adapted existing annotation guidelines, introducing the novel category of incident/event and a third level of annotation for statements. Exploratory analysis of 35 linguistic surface-level features highlights sentence length as the strongest predictor of check-worthiness, but remains inconclusive for more specific annotation. We therefore investigated the performance of transformer-based models for check-worthiness detection on the GCC corpus, in which classification accuracy was increased by translating the dataset into English, augmenting the dataset by adding additional data from a related task, and enriching the semantics by including related ontology embeddings.
Loading