<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

 <title>The ICLR Blog Track</title>
 <link href="https://iclr.iro.umontreal.ca/0b55e502-57a5-4c24-824d-6868fff6c8b7_1642247662/atom.xml" rel="self"/>
 <link href="https://iclr.iro.umontreal.ca/0b55e502-57a5-4c24-824d-6868fff6c8b7_1642247662/"/>
 <updated>2022-01-15T05:54:28-06:00</updated>
 <id>https://iclr.iro.umontreal.ca/0b55e502-57a5-4c24-824d-6868fff6c8b7_1642247662</id>
 <author>
   <name>Mark Otto</name>
   <email>markdotto@gmail.com</email>
 </author>

 
 <entry>
   <title>Do KG-augmented Models Leverage Knowledge as Humans Do?</title>
   <link href="https://iclr.iro.umontreal.ca/0b55e502-57a5-4c24-824d-6868fff6c8b7_1642247662/2021/12/01/Do-KG-augmented-Models-Leverage-Knowledge-as-Humans-Do/"/>
   <updated>2021-12-01T00:00:00-06:00</updated>
   <id>https://iclr.iro.umontreal.ca/0b55e502-57a5-4c24-824d-6868fff6c8b7_1642247662/2021/12/01/Do-KG-augmented-Models-Leverage-Knowledge-as-Humans-Do</id>
   <content type="html">&lt;p&gt;Knowledge Graphs (KGs) can help neural-symbolic models to improve performance on various knowledge-intensive tasks, like recommendation systems and question answering. Concretely, neural reasoning over KGs may “explain” which information is relevant for inference. However, as an old saying goes, “seeing is not believing,” it is natural to ask the question, “do KG-augmented models really behave as we expect？” This post presents the historical perspectives of KG-augmented models and discusses a recent work &lt;a href=&quot;#refer-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/a&gt; raising this question. Interestingly, empirical results demonstrate that perturbed KGs can maintain the downstream performance, which subvert our cognition over KG-augmented models’ ability. We believe this topic is necessary and important for neural-symbolic reasoning and can guide future work on designing KG-augmented models.&lt;/p&gt;

&lt;h2 id=&quot;kg-augmented-models&quot;&gt;KG-augmented Models&lt;/h2&gt;

&lt;p&gt;Pre-training fine-tuning has become a de facto standard for natural language processing. However, the performance of the knowledge-intensive task (for example, question answering or relation extraction) is dependent on structured relational knowledge; thus, the direct fine-tuning of pre-trained LMs yield suboptimal results.
To this end, external knowledge has been considered as an important part of language understanding, inspriring KG-augmented works including ERNIE (Tsinghua) &lt;a href=&quot;#refer-2&quot;&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/a&gt;, ERNIE (Baidu) &lt;a href=&quot;#refer-3&quot;&gt;&lt;sup&gt;[3]&lt;/sup&gt;&lt;/a&gt;, KnowBERT &lt;a href=&quot;#refer-4&quot;&gt;&lt;sup&gt;[4]&lt;/sup&gt;&lt;/a&gt;, WKLM &lt;a href=&quot;#refer-5&quot;&gt;&lt;sup&gt;[5]&lt;/sup&gt;&lt;/a&gt;, LUKE &lt;a href=&quot;#refer-6&quot;&gt;&lt;sup&gt;[6]&lt;/sup&gt;&lt;/a&gt;, KEPLER &lt;a href=&quot;#refer-7&quot;&gt;&lt;sup&gt;[7]&lt;/sup&gt;&lt;/a&gt;, GLM &lt;a href=&quot;#refer-8&quot;&gt;&lt;sup&gt;[8]&lt;/sup&gt;&lt;/a&gt;, K-Adaptor &lt;a href=&quot;#refer-9&quot;&gt;&lt;sup&gt;[9]&lt;/sup&gt;&lt;/a&gt;, and CoLAKE &lt;a href=&quot;#refer-10&quot;&gt;&lt;sup&gt;[10]&lt;/sup&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To a certain extent, KG-augmented models can enhance the representation and alleviate the data requirements of the tasks. However, when and how much external knowledge for effective infusion remains to be well understood. Recent study &lt;a href=&quot;#refer-11&quot;&gt;&lt;sup&gt;[11]&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#refer-12&quot;&gt;&lt;sup&gt;[12]&lt;/sup&gt;&lt;/a&gt; &lt;a href=&quot;#refer-13&quot;&gt;&lt;sup&gt;[13]&lt;/sup&gt;&lt;/a&gt; observe that the incorporation of excessive or irrelevant knowledge might divert the context representation from its correct meaning, and may hurt the performance. Besides, &lt;a href=&quot;#refer-14&quot;&gt;&lt;sup&gt;[14]&lt;/sup&gt;&lt;/a&gt; find  that pre-trained LMs are partially equipped with some kind of relational knowledge, which even promote the new paradigm of “pre-train, prompt, predict” for NLP–prompt-orient learning &lt;a href=&quot;#refer-15&quot;&gt;&lt;sup&gt;[15]&lt;/sup&gt;&lt;/a&gt;. To be honest, there are still many unknowns, and we even do not know the fundamental mechanism for KG-augmented models.&lt;/p&gt;

&lt;h2 id=&quot;kg-augmented-models-actually-use-kgs-in-human-like-manner&quot;&gt;KG-augmented Models actually use KGs in Human-like Manner?&lt;/h2&gt;

&lt;div align=&quot;center&quot;&gt;
    &lt;img src=&quot;https://iclr.iro.umontreal.ca/0b55e502-57a5-4c24-824d-6868fff6c8b7_1642247662/public/images/2021-12-01-Do-KG-augmented-Models-Leverage-Knowledge-as-Humans-Do/method.png&quot; alt=&quot;Distribution&quot; style=&quot;zoom:70%;&quot; /&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;Since the process in which KG-augmented models reason about entities is still not well understood, the recent ICLR paper &lt;a href=&quot;#refer-1&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/a&gt; empirically study this problem by measuring model performance when the KG’s structure and semantics have been perturbed to hinder human comprehension. Previous study hypothesis that, like humans, KG-augmented models base their predictions on meaningful relational paths; however, the empirical results of this paper illustrate new findings. Specifically, this paper leverage KG perturbation with a reinforcement learning policy or even simple heuristics, as shown in the above Figure. Contrary to the common assumption, the KG-augmented model can maintain the downstream performance from the original KG while significantly deviating from the original KG’s structure and semantics. More detailed results of commonsense question answering and item recommendation are in the following Tables:&lt;/p&gt;

&lt;div align=&quot;center&quot;&gt;
    &lt;img src=&quot;https://iclr.iro.umontreal.ca/0b55e502-57a5-4c24-824d-6868fff6c8b7_1642247662/public/images/2021-12-01-Do-KG-augmented-Models-Leverage-Knowledge-as-Humans-Do/qa.png&quot; alt=&quot;Distribution&quot; style=&quot;zoom:=70%;&quot; /&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;div align=&quot;center&quot;&gt;
    &lt;img src=&quot;https://iclr.iro.umontreal.ca/0b55e502-57a5-4c24-824d-6868fff6c8b7_1642247662/public/images/2021-12-01-Do-KG-augmented-Models-Leverage-Knowledge-as-Humans-Do/rec.png&quot; alt=&quot;Distribution&quot; style=&quot;zoom:70%;&quot; /&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;Briefly, we can conclude (partially):&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;KG-augmented models are different from human intelligence&lt;/strong&gt;. KG-augmented models process knowledge in a way that does not align with human priors.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;KG-augmented models should not be fully faithful&lt;/strong&gt;. KG-augmented models can be robust to noisy data, and we cannot fully trust the prediction evidence from KGs.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;kg-augmented-models-extrapolation-interpolation&quot;&gt;KG-Augmented Models: Extrapolation? Interpolation?&lt;/h2&gt;

&lt;p&gt;Note that in cognitive science, when we memorize a certain type of concept and the corresponding relationship of the perceived things, we will store the things we see abstractly (or called prototype). When facing some similar objects (few-shot/zero-shot), we can quickly reason through prototypes and makes a prediction.&lt;/p&gt;

&lt;p&gt;However, data-driven approaches struggle to handle new concepts (or non-i.i.d data). Recent approach  &lt;a href=&quot;#refer-15&quot;&gt;&lt;sup&gt;[15]&lt;/sup&gt;&lt;/a&gt; has revealed the success of GNNs in extrapolating algorithmic tasks to new data. Intuitively, we think KG-augmented models  have the potential  ability to extend data-driven approaches in extrapolation, and we  hope KG-augmented models can improve generization, thus, making AI robust in real-world applications.&lt;/p&gt;

&lt;h3 id=&quot;open-questions&quot;&gt;Open Questions&lt;/h3&gt;

&lt;p&gt;When we read the paper and wrote the blog, we also find some interesting open questions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;The fundamental theory of knowledge-enhanced models&lt;/em&gt;. When, where, and how do we need knowledge for machine learning? With the recent development of pre-trained models (foundation models), how to combine knowledge stored in PTMs as “modeledge” &lt;a href=&quot;#refer-16&quot;&gt;&lt;sup&gt;[16]&lt;/sup&gt;&lt;/a&gt; with symbolic knowledge formalized by human beings need further investigation.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Do we need to force the KG-augmented model to work like a human?&lt;/em&gt; Unlike bird that must have either flapping or oscillating wings, airplanes fly in the sky based on aerodynamics. Can the recent or more advanced unspervsided/self-supervised learning aquire meta pattern (“meta modeledge”) for intellengance? More works should be explored.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;div id=&quot;refer-1&quot;&gt;[1] Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation. (ICLR 2021)&lt;/div&gt;
&lt;div id=&quot;refer-2&quot;&gt;[2] ERNIE: Enhanced Language Representation with Informative Entities. (ACL 2019)&lt;/div&gt;
&lt;div id=&quot;refer-3&quot;&gt;[3] Ernie: Enhanced representation through knowledge integration. (ACL 2019)&lt;/div&gt;
&lt;div id=&quot;refer-4&quot;&gt;[4] Knowledge enhanced contextual word representations. (EMNLP 2019)&lt;/div&gt;
&lt;div id=&quot;refer-5&quot;&gt;[5] Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model. (ICLR 2020)&lt;/div&gt;
&lt;div id=&quot;refer-6&quot;&gt;[6] Luke: deep contextualized entity representations with entity-
aware self-attention. (EMNLP 2020)&lt;/div&gt;
&lt;div id=&quot;refer-7&quot;&gt;[7] KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation. (TACL 2019)&lt;/div&gt;
&lt;div id=&quot;refer-8&quot;&gt;[8] Exploiting structured knowledge in text via graph-guided representation learning. (EMNLP 2020)&lt;/div&gt;
&lt;div id=&quot;refer-9&quot;&gt;[9] K-adapter: Infusing knowledge into pre-trained models with adapters. (ACL Findings 2021)&lt;/div&gt;
&lt;div id=&quot;refer-10&quot;&gt;[10] Colake: Contextualized language and knowledge embedding. (COLING 2020)&lt;/div&gt;
&lt;div id=&quot;refer-11&quot;&gt;[11] K-bert: Enabling language representation with knowledge graph. (AAAI 2020)&lt;/div&gt;
&lt;div id=&quot;refer-12&quot;&gt;[12] Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining. (IJCAI 2021)&lt;/div&gt;
&lt;div id=&quot;refer-13&quot;&gt;[13] DKPLM: Decomposable Knowledge-Enhanced Pre-Trained Language Model for Natural Language Understanding. (AAAI 2022)&lt;/div&gt;
&lt;div id=&quot;refer-14&quot;&gt;[14] Language Models as Knowledge Bases?. (EMNLP 2019)&lt;/div&gt;
&lt;div id=&quot;refer-15&quot;&gt;[15] How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks . (ICLR 2021)&lt;/div&gt;
&lt;div id=&quot;refer-16&quot;&gt;[16] Pre-Trained Models: Past, Present and Future. (AI Open 2021)
&lt;/div&gt;
</content>
 </entry>
 

</feed>
