MorphCon – A Software for Conversion of Czech Morphological Tagsets

Published: 30 Dec 2009, Last Modified: 24 Mar 2024Levická, J. – Garabík, R. (eds.): NLP, Corpus Linguistics, Corpus Based Grammar Research. Brno, Tribun 2009, 292–301EveryoneCC BY-NC-ND 4.0
Abstract: This study reflects current situation in Czech corpus linguistics with a special view to morphological annotation of language corpora. Several morphological tagsets of Czech exist and differ by the conception reflecting morphological categories in different extent of complexity. There has also been no possibility of conversion among tagsets. New tool called MorphCon (Morphological Convertor) is now being developed for these purposes. This first version (0.1alpha) enables converting of two basic morphological tagsets of Czech: Prague positional system and Brno’s attributive system. There are three basic Input/Output (I/O) formats of data (SimpleTag-Conversion, KWIC/Tag-Format, WPL-Format) within version 0.1alpha. The structure and basic functions of MorphCon are described: Tagsets are implemented into the MorphCon as "drivers" with "encode" and "decode" function as well as an "universal library" called DZInterset (by Daniel Zeman) – modified in our tool – plays key role for the process of conversion as a transcoder.
Loading