KALE: Enhancing Knowledge Manipulation in Large Language Models via Knowledge-aware Learning

Qitan Lv; Tianyu Liu; Qiaosheng Zhang; Xingcheng Xu; Chaochao Lu

KALE: Enhancing Knowledge Manipulation in Large Language Models via Knowledge-aware Learning

Qitan Lv, Tianyu Liu, Qiaosheng Zhang, Xingcheng Xu, Chaochao Lu

18 Sept 2025 (modified: 05 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Knowledge Manipulation, Post-training

TL;DR: We propose KALE, a novel post-training framework that leverages knowledge graphs to generate high-quality relevant rationales and enhance the knowledge manipulation ability of large language models.

Abstract: Despite the impressive performance of large language models (LLMs) pretrained on vast knowledge corpora, advancing their knowledge manipulation performance—the ability to effectively **recall, reason, and transfer relevant knowledge**—still remains challenging. Existing methods mainly leverage supervised fine-tuning (SFT) to enable LLMs to recall task-relevant knowledge by continuing the training process on labeled datasets. However, we observe that LLMs fine-tuned via SFT still occasionally exhibit the *known\&incorrect* phenomenon, where LLMs explicitly possess the relevant knowledge of a given question but cannot effectively manipulate it to answer correctly. To address this challenge, we propose KALE—a novel post-training framework that leverages knowledge graphs (KGs) to generate high-quality relevant rationales and enhance the knowledge manipulation ability via **K**nowledge-**A**ware **LE**arning. Specifically, KALE **first** proposes a **K**nowledge-**I**nduced (KI) data synthesis method to generate high-quality data rationales, i.e., a textual reasoning process from each question to correct answer through external KGs. **Then** KALE proposes a **K**nowledge-**A**ware (KA) fine-tuning paradigm to enhance the knowledge manipulation ability of LLMs. Extensive experiments on **eight** popular benchmarks across **six** different LLM backbones demonstrate the effectiveness of KALE, leading to an accuracy improvement of up to 11.72\% and an average of 4.18\%.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 12173

Loading