Do KG-augmented Models Leverage Knowledge as Humans Do?

Knowledge Graphs (KGs) can help neural-symbolic models to improve performance on various knowledge-intensive tasks, like recommendation systems and question answering. Concretely, neural reasoning over KGs may “explain” which information is relevant for inference. However, as an old saying goes, “seeing is not believing,” it is natural to ask the question, “do KG-augmented models really behave as we expect?” This post presents the historical perspectives of KG-augmented models and discusses a recent work [1] raising this question. Interestingly, empirical results demonstrate that perturbed KGs can maintain the downstream performance, which subvert our cognition over KG-augmented models’ ability. We believe this topic is necessary and important for neural-symbolic reasoning and can guide future work on designing KG-augmented models.

KG-augmented Models

Pre-training fine-tuning has become a de facto standard for natural language processing. However, the performance of the knowledge-intensive task (for example, question answering or relation extraction) is dependent on structured relational knowledge; thus, the direct fine-tuning of pre-trained LMs yield suboptimal results. To this end, external knowledge has been considered as an important part of language understanding, inspriring KG-augmented works including ERNIE (Tsinghua) [2], ERNIE (Baidu) [3], KnowBERT [4], WKLM [5], LUKE [6], KEPLER [7], GLM [8], K-Adaptor [9], and CoLAKE [10].

To a certain extent, KG-augmented models can enhance the representation and alleviate the data requirements of the tasks. However, when and how much external knowledge for effective infusion remains to be well understood. Recent study [11] [12] [13] observe that the incorporation of excessive or irrelevant knowledge might divert the context representation from its correct meaning, and may hurt the performance. Besides, [14] find that pre-trained LMs are partially equipped with some kind of relational knowledge, which even promote the new paradigm of “pre-train, prompt, predict” for NLP–prompt-orient learning [15]. To be honest, there are still many unknowns, and we even do not know the fundamental mechanism for KG-augmented models.

KG-augmented Models actually use KGs in Human-like Manner?

Distribution

Since the process in which KG-augmented models reason about entities is still not well understood, the recent ICLR paper [1] empirically study this problem by measuring model performance when the KG’s structure and semantics have been perturbed to hinder human comprehension. Previous study hypothesis that, like humans, KG-augmented models base their predictions on meaningful relational paths; however, the empirical results of this paper illustrate new findings. Specifically, this paper leverage KG perturbation with a reinforcement learning policy or even simple heuristics, as shown in the above Figure. Contrary to the common assumption, the KG-augmented model can maintain the downstream performance from the original KG while significantly deviating from the original KG’s structure and semantics. More detailed results of commonsense question answering and item recommendation are in the following Tables:

Distribution
Distribution

Briefly, we can conclude (partially):

  • KG-augmented models are different from human intelligence. KG-augmented models process knowledge in a way that does not align with human priors.

  • KG-augmented models should not be fully faithful. KG-augmented models can be robust to noisy data, and we cannot fully trust the prediction evidence from KGs.

KG-Augmented Models: Extrapolation? Interpolation?

Note that in cognitive science, when we memorize a certain type of concept and the corresponding relationship of the perceived things, we will store the things we see abstractly (or called prototype). When facing some similar objects (few-shot/zero-shot), we can quickly reason through prototypes and makes a prediction.

However, data-driven approaches struggle to handle new concepts (or non-i.i.d data). Recent approach [15] has revealed the success of GNNs in extrapolating algorithmic tasks to new data. Intuitively, we think KG-augmented models have the potential ability to extend data-driven approaches in extrapolation, and we hope KG-augmented models can improve generization, thus, making AI robust in real-world applications.

Open Questions

When we read the paper and wrote the blog, we also find some interesting open questions:

  1. The fundamental theory of knowledge-enhanced models. When, where, and how do we need knowledge for machine learning? With the recent development of pre-trained models (foundation models), how to combine knowledge stored in PTMs as “modeledge” [16] with symbolic knowledge formalized by human beings need further investigation.

  2. Do we need to force the KG-augmented model to work like a human? Unlike bird that must have either flapping or oscillating wings, airplanes fly in the sky based on aerodynamics. Can the recent or more advanced unspervsided/self-supervised learning aquire meta pattern (“meta modeledge”) for intellengance? More works should be explored.

References

[1] Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation. (ICLR 2021)
[2] ERNIE: Enhanced Language Representation with Informative Entities. (ACL 2019)
[3] Ernie: Enhanced representation through knowledge integration. (ACL 2019)
[4] Knowledge enhanced contextual word representations. (EMNLP 2019)
[5] Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model. (ICLR 2020)
[6] Luke: deep contextualized entity representations with entity- aware self-attention. (EMNLP 2020)
[7] KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation. (TACL 2019)
[8] Exploiting structured knowledge in text via graph-guided representation learning. (EMNLP 2020)
[9] K-adapter: Infusing knowledge into pre-trained models with adapters. (ACL Findings 2021)
[10] Colake: Contextualized language and knowledge embedding. (COLING 2020)
[11] K-bert: Enabling language representation with knowledge graph. (AAAI 2020)
[12] Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining. (IJCAI 2021)
[13] DKPLM: Decomposable Knowledge-Enhanced Pre-Trained Language Model for Natural Language Understanding. (AAAI 2022)
[14] Language Models as Knowledge Bases?. (EMNLP 2019)
[15] How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks . (ICLR 2021)
[16] Pre-Trained Models: Past, Present and Future. (AI Open 2021)