GA-Tag: Data Enrichment with an Automatic Tagging System Utilizing Large Language Models

Genki Kusano

Published: 2024, Last Modified: 25 Jul 2025ICDE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Data quality is widely recognized as being directly linked to the quality of analysis results. In this study, we introduce a tagging method that simplifies the handling of extensive data and facilitates the rapid search and extraction of relevant information. Traditional methods that search for and integrate related data from external sources to enrich input data often fail to guarantee the acquisition of desirable information for all data sets. However, the recent advancement of Large Language Models (LLMs) enables the prediction of characteristics of input data, even in the absence of relevant data. In this paper, we present the Generated and Aggregated Tag (GA-Tag), a system that employs LLMs to automatically assign appropriate tags to data and is equipped with an aggregation mechanism to manage tag diversity effectively. The adoption of GA-Tag is anticipated to enhance data analysis and management quality and efficiency, optimize monetary and time costs, and potentially bolster business intelligence and decision-making processes.