CrossMath: Towards Cross-lingual Math Information Retrieval

Published: 01 Jan 2024, Last Modified: 27 Apr 2025ICTIR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Current math search engines and test collections are primarily designed for the English language, limiting their accessibility and inclusivity. This paper introduces cross-lingual math information retrieval (CLMIR) to overcome this limitation, focusing on retrieving mathematical information across languages. The paper presents CrossMath, a novel CLMIR test collection comprising manually translated topics in four languages (Croatian, Czech, Persian, and Spanish). Additionally, a CLMIR system leveraging state-of-the-art translation models (mBART and NLLB) alongside a formula masking approach to handle mathematical notation is introduced. Evaluation results on the ARQMath test collections show the effectiveness of the proposed CLMIR system, indicating competitive effectiveness compared to using English topics for all four languages.
Loading