Does Rhetorical Structure Matter More Than Linguistic Proximity? A Study on Cross-Lingual Sequential Sentence Classification
Keywords: Sequential sentence classification, Cross-lingual transfer learning, Structure similarity, Linguistic proximity
Abstract: Sequential sentence classification (SSC) is an essential task for structuring scientific publications, and extending SSC research to languages other than English would improve accessibility to scientific knowledge. At present, cross-lingual transfer is a promising approach to address the scarcity of training data in non-English languages. Although prior work on other natural language processing tasks has shown the benefits of identifying linguistic similarity between source and target languages, SSC inherently depends on discourse-level patterns, such as label sequences and positional regularities, which exhibit consistency across languages regardless of linguistic differences. To examine the factors that determine transfer success in SSC, we construct a multilingual SSC dataset covering 13 non-English languages. Our cross-lingual transfer experiments, which use both encoder-based and generative models, reveal that structural similarity in rhetorical organization correlates more strongly with transfer performance than linguistic proximity, and this pattern holds across different model architectures. Based on this finding, we propose a novel framework that explicitly leverages structural information to improve SSC, demonstrating improvements over baselines in both in-language evaluation and transfer to languages unseen during training.
Paper Type: Long
Research Area: Multilinguality and Language Diversity
Research Area Keywords: Multilingualism and Cross-Lingual NLP
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English, French, Japanese, Spanish, Chinese, Russian, Portuguese, Italian, Indonesian, Turkish, Korean, Polish, Dutch, Estonian
Submission Number: 9034
Loading