KAT5: Knowledge-Aware Transfer Learning with a Text-to-Text Transfer Transformer

Published: 2024, Last Modified: 09 Sept 2025ECML/PKDD (9) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We introduce knowledge-aware transfer learning with a text-to-text transfer transformer (KAT5) by leveraging a text-to-text transfer transformer (T5) in the Wikipedia domain. In standard transfer learning like T5, a model is first pre-trained on an unsupervised data task with a language model objective before fine-tuning it on a downstream task. T5 explores several learning objectives, including masked language model (MLM), random span, and deshuffling, where the model is limited to exploring integrating knowledge during pre-training. Here, we push the limits of this model by grafting knowledge like entity and co-reference information by mapping Wikipedia and Wikidata during pre-training. We align large-scale alignments between Wikipedia abstract and Wikidata triples to facilitate our pre-training KAT5 model. Our approach can match or outperform task-specific models while using the same architecture and hyper-parameters, in particular in entity and relation extraction (CoNLL04, ADE, and NYT datasets), and language generation tasks, including abstractive summarization (XSum, CNNDM), and machine translation. Our code is publicly released on GitHub (https://github.com/aistairc/kat5) under the Apache 2.0 License.
Loading