A Linguistically-Based Segmentation of Complex Sentences

Published: 01 Jan 2007, Last Modified: 19 Feb 2025FLAIRS 2007EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units, which may provide a basis for further processing of complex sentences. The method has been developed for Czech as a language representing languages with relatively high degree of word-order freedom. The paper introduces important terms, describes a segmentation chart, the data structure used for the description of mutual relationship between individual segments and separators. It contains a simple set of rules applied for the segmentation of a small set of Czech sentences. The issues of segment annotation based on existing corpus are also mentioned.
Loading