Abstract: Health news profoundly shapes public understanding and healthcare decisions globally, yet linguistic barriers between English and Chinese—two of the world's most widely spoken languages—create significant health knowledge disparities. To address this challenge, we propose HealthBridge, a novel cross-lingual clustering framework that automatically identifies and aligns related health articles across these languages. We present a comprehensive bilingual health news corpus comprising 68,603 articles spanning 2018–2025, with a carefully constructed evaluation set featuring 470 high-quality health news clusters. The HealthBridge framework integrates advanced text representation strategies with temporal constraints. Experimental results demonstrate that, for the task of cross-lingual health news association, methods based on complex vector representations significantly outperform explicit entity matching techniques. HealthBridge advances both cross-lingual natural language processing and global health communication by enabling systematic analysis of health reporting across linguistic boundaries. This framework not only reveals cultural variations in health news framing but also promotes equitable access to health information in an increasingly interconnected world.
External IDs:doi:10.1007/978-981-95-0020-8_36
Loading