Korean Part-of-speech Tagging Based on Morpheme Generation

Hyun-Je Song, Seong-Bae Park

Published: 01 Jan 2020, Last Modified: 03 Oct 2023ACM Trans. Asian Low Resour. Lang. Inf. Process. 2020Readers: Everyone

Abstract: Two major problems of Korean part-of-speech (POS) tagging are that the word-spacing unit is not mapped one-to-one to a POS tag and that morphemes should be recovered during POS tagging. Therefore, this article proposes a novel two-step Korean POS tagger that solves the problems. This tagger first generates a sequence of lemmatized and recovered morphemes that can be mapped one-to-one to a POS tag using an encoder-decoder architecture derived from a POS-tagged corpus. Then, the POS tag of each morpheme in the generated sequence is finally determined by a standard sequence labeling method. Since the knowledge for segmenting and recovering morphemes is extracted automatically from a POS-tagged corpus by an encoder-decoder architecture, the POS tagger is constructed without a dictionary nor handcrafted linguistic rules. The experimental results on a standard dataset show that the proposed method outperforms existing POS taggers with its state-of-the-art performance.

0 Replies