Knowledge Based Multilingual Language Model

Linlin Liu; Xin Li; Ruidan He; Lidong Bing; Shafiq Joty; Luo Si

Knowledge Based Multilingual Language Model

Linlin Liu, Xin Li, Ruidan He, Lidong Bing, Shafiq Joty, Luo Si

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: Language Model, Knowledge, Multilingual

Abstract: Knowledge enriched language representation learning has shown promising performance across various knowledge-intensive NLP tasks. However, existing knowledge based language models are all trained with monolingual knowledge graph data, which limits their application to more languages. In this work, we present a novel framework to pretrain knowledge based multilingual language models (KMLMs). We first generate a large amount of code-switched synthetic sentences and reasoning-based multilingual training data using the Wikidata knowledge graphs. Then based on the intra- and inter-sentence structures of the generated data, we design pretraining tasks to facilitate knowledge learning, which allows the language models to not only memorize the factual knowledge but also learn useful logical patterns. Our pretrained KMLMs demonstrate significant performance improvements on a wide range of knowledge-intensive cross-lingual NLP tasks, including named entity recognition, factual knowledge retrieval, relation classification, and a new task designed by us, namely, logic reasoning. Our code and pretrained language models will be made publicly available.

One-sentence Summary: We present a novel framework to pretrain knowledge based multilingual language models (KMLMs).

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/knowledge-based-multilingual-language-model/code)

10 Replies

Loading