Research on cross-lingual multi-label patent classification based on pre-trained model

Yonghe Lu; Lehua Chen; Xinyu Tong; Yongxin Peng; Hou Zhu

Research on cross-lingual multi-label patent classification based on pre-trained model

Yonghe Lu, Lehua Chen, Xinyu Tong, Yongxin Peng, Hou Zhu

Published: 01 Jan 2024, Last Modified: 04 Feb 2025Scientometrics 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Patent classification is an important part of the patent examination and management process. Using efficient and accurate automatic patent classification can significantly improve patent retrieval performance. Current monolingual patent classification models, on the other hand, are insufficient for cross-lingual patent tasks. Therefore, research into cross-lingual patent categorization is crucial. In this paper, we proposed a cross-lingual patent classification model based on the pre-trained model named XLM-R–CNN. Besides, we constructed a large patent dataset called XLPatent including Chinese, English, and German. We conducted experiments to evaluate model performance with several metrics. The experimental results showed that XLM-R–CNN achieved a classification accuracy of 73% and average precision of 94%.

Loading