mStyleDistance: Multilingual Style Embeddings and their Evaluation

ACL ARR 2025 February Submission1065 Authors

12 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Style embeddings are useful for stylistic analysis and style transfer, yet they only exist for English. We introduce Multilingual StyleDistance (mStyleDistance), a method that can generate style embeddings in new languages using synthetic data and a contrastive loss. We create style embeddings in nine languages and a multilingual STEL-or-Content benchmark (Wegmann et al., 2022) that serves to assess their quality. We also employ our embeddings in an authorship verification task involving different languages. Our results show that mStyleDistance embeddings outperform existing style embeddings on these benchmarks and generalize well to unseen features and languages. We make our models and datasets publicly available.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: multilingual benchmarks, multilingual evaluation, evaluation methodologies, benchmarking, representation learning, text style, prompting, style, embeddings, synthetic data
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English, Arabic, German, Spanish, French, Hindi, Japanese, Korean, Russian, Chinese
Submission Number: 1065
Loading