Abstract: This research focuses on the cross-lingual text summarization between Russian and Chinese languages, addressing the growing need for effective information exchange amidst globalized relations between China and Russia. It highlights the linguistic challenges posed by these languages’ complex grammatical structures and unique alphabets. The study investigates existing methods and datasets, emphasizing the importance of improving cross-lingual summarization technology to overcome language barriers. The key findings demonstrate that Direct Preference Optimization, a standard reinforcement learning algorithm for LLM training, significantly improves summarization quality, particularly in many-to-one training scenarios, when compared to traditional pipeline methods. The study utilizes datasets such as WikiLingua and CrossSum, alongside manually collected data, to ensure comprehensive evaluation. The best results for the Russian-to-Chinese model showed a ROUGE-2 score of 11.71 and a LaSE (Language-agnostic Summary Evaluation) score of 31.18. Additional metrics include Language Confidence and Length Penalty. GPT-4 assessments further confirm the improvements in the generated summaries.
Loading