Text Preprocessing and Annotation Tool for Time Information

Chae-Gyun Lim, Young-Seob Jeong, Woo-Jin Kim, Youngjin Kim, Ho-Jin Choi

Published: 2024, Last Modified: 14 Nov 2024BigComp 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Time information extraction is one of the most important tasks because time-related expressions appear in numerous documents and typically become vital for under-standing contexts. In the English language, there are enough documents available as datasets to conduct research on tasks related to natural language processing and understanding, but this is rarely the case for non-English languages. Also, there are a limited number of available annotation tools for constructing new time information datasets. In this paper, we present our own annotation tool for time information and describe the process of creating a new dataset by employing crowd workers and then using our implemented tool. We then highlight considerations on the difficulties we encountered during the empirical annotation process, how to educate the workers, and efforts to maintain the quality of our dataset.