LLM-based Hierarchical Label Annotation for Foodborne Illness Detection on Social Media

Dongyu Zhang, Ruofan Hu, Dandan Tao, Hao Feng, Elke A. Rundensteiner

Published: 01 Jan 2024, Last Modified: 04 Oct 2025IEEE Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Foodborne illnesses pose a threat to public health, leading to morbidity, mortality, and economic burden annually. Social media, while providing a rich timely source for training AI models for surveillance, requires effective tools for annotation. While Large Language Models (LLMs) have shown promise for generating simple labels, here hierarchical labels composed of entity types like food type and symptom (at individual word level) and the foodborne illness event (at complete post level) are required. For this, we introduce ICL2FID, the first LLM-based hierarchical labeling framework designed to annotate social media posts for foodborne illness detection at two levels using only a few demonstration examples. To utilize the interconnection between post and word levels, ICL2FID instructs the LLM to leverage information from one level when predicting the other level. To combat model hallucination and cyclic dependencies, a verification step improves evidence propagation between interconnected word and post-level labeling tasks. Strategies for custom selection of demonstration examples are designed reducing biases and increasing representation. We compare ICL2FID against traditional supervised learning and other LLM methods, demonstrating that it not only achieves superior accuracy but does so at a fraction of the cost and time. These findings highlight ICL2FID’s potential as a viable alternative for hierarchical label generation in scenarios with limited resources and huge data sets. Code is available at https://github.com/zdy93/ICL2FID.