<!DOCTYPE html>
<html lang="en-us">

  <head>
  <link href="http://gmpg.org/xfn/11" rel="profile">
  <meta http-equiv="content-type" content="text/html; charset=utf-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">

  <title>
    
      Prototypical Representation Learning for Low-resource Knowledge Extraction&#58 Summary and Perspective &middot; The ICLR Blog Track
    
  </title>

  
  <link rel="canonical" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/2021/12/01/prototypical-representation-learning-for-low-resource-knowledge-extraction-summary-and-perspective/">
  

  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/css/poole.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/css/syntax.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/css/lanyon.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/css/custom.css">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">

  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/apple-touch-icon-precomposed.png">
  <link rel="shortcut icon" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/favicon.ico">

  <link rel="alternate" type="application/rss+xml" title="RSS" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/atom.xml">

  

  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript" ></script>
 <!-- <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> -->
  <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
         processEscapes: false
        }
      });
</script>
</head>


  <body>

    <!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
     styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
<!-- <input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" > -->

<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
  <div class="sidebar-item">
    <p>For short-term, peer-sourced tests of time, generalizations, specializations, reproductions, etc.!</p>
  </div>

  <nav class="sidebar-nav">

    

    
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/">ICLR 2022 Blog Track</a>
        
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/about/">About</a>
        
      
    
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/submitting/">Submitting</a>
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/tags/">Tags</a>
        
      
    

    <a class="sidebar-nav-item" href="https://github.com/iclr-blog-track/iclr-blog-track.github.io">GitHub project</a>
    <span class="sidebar-nav-item">Currently vICLR Spring 2021</span>
  </nav>

  <div class="sidebar-item">
    <p>
      &copy; 2022. All rights reserved.
    </p>
  </div>
</div>


    <!-- Wrap is the content to shift when toggling the sidebar. We wrap the
         content to avoid any CSS collisions with our real content. -->
    <div class="wrap">
      <div class="masthead">
        <div class="container">
          <h3 class="masthead-title">
            <a href="/" title="Home">The ICLR Blog Track</a>
            <small></small>
          </h3>
        </div>
      </div>

      <div class="container content">
        <div class="post">
  <h1 id="iclr-post-title" class="post-title">Prototypical Representation Learning for Low-resource Knowledge Extraction&#58 Summary and Perspective</h1>
  <span class="post-date">01 Dec 2021 | 
    <a class="content-tag" href="/tags/#prototype"> Prototype </a>
  
    <a class="content-tag" href="/tags/#low-resource"> Low-resource </a>
  
    <a class="content-tag" href="/tags/#knowledge-extraction"> Knowledge Extraction </a>
  </span>

  <span id="iclr-post-authors" class="post-date">Anonymous</span>
  <p>Recent years have witnessed the success of prototypical representation in widespread low-resource tasks, since “Prototypical Networks for Few-shot Learning (NeurIPS 2017)<a href="#refer-anchor-1"><sup>[1]</sup></a>” proposed to represent each class as a prototype by the mean of its instance embeddings and learn a metric space in which classification can be performed by computing distances to prototypes. A recent paper “<em>Prototypical Representation Learning for Relation Extraction</em><a href="#refer-anchor-2"><sup>[2]</sup></a>” accepted by ICLR 2021, as a member of the growing zoo of prototypical networks, has addressed <strong>prototypical representation learning for low-resource knowledge extraction</strong>. 
In this post, we briefly summarize this issue by highlighting the ICLR paper. Different from vanilla prototypical networks, this ICLR paper has proposed to tackle low-resource knowledge extraction (1) considering both <em>compactness intra each prototype</em> and <em>separability inter prototypes</em>, (2) by leveraging <em>contrastive learning</em> and projecting prototypes into <em>geometric space</em>. 
Furthermore, we also point out some shortcomings of this paper and put forward some promising directions. 
<br /></p>
<h1 id="content">Content</h1>
<ul>
  <li><a href="#Section-1">1. Low-reource Knowledge Extraction</a>
<br /></li>
  <li><a href="#Section-2">2. Prototypical Representation Learning</a>
    <ul>
      <li><a href="#Section-2.1">2.1 Intra-prototype Learning</a>
        <ul>
          <li><a href="#Section-2.1.1">intra-instance: on feature-level</a></li>
          <li><a href="#Section-2.1.2">inter-instance: on sentence-level</a></li>
          <li><a href="#Section-2.1.3">joint intra- and inter- instance</a></li>
        </ul>
      </li>
      <li><a href="#Section-2.2">2.2 Inter-prototype Learning</a>
        <ul>
          <li><a href="#Section-2.2.1">considering long-tail distribution of  prototypes</a></li>
          <li><a href="#Section-2.2.2">considering label dependency of prototypes</a></li>
          <li><a href="#Section-2.2.3">considering knowledge constraint of  prototypes</a></li>
        </ul>
      </li>
      <li><a href="#Section-2.3">2.3 Joint Intra- and Inter- Prototype Learning</a>
        <ul>
          <li><a href="#Section-2.3.1">learning on instance-level</a></li>
          <li><a href="#Section-2.3.2">learning on Instance-prototype level</a>
<br /></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="#Section-3">3. Promising Research Directions</a>
    <ul>
      <li><a href="#Section-3.1">3.1 Knowledge-enhanced Prototype Learning</a>
        <ul>
          <li><a href="#Section-3.1.1">injecting concept-level knowledge</a></li>
          <li><a href="#Section-3.1.2">injecting class-level knowledge</a></li>
        </ul>
      </li>
      <li><a href="#Section-3.2">3.2 Geometrical Prototype Learning</a>
        <ul>
          <li><a href="#Section-3.2.1">in hyperbolic space</a></li>
          <li><a href="#Section-3.2.2">in hyperspherical space</a>
<br /></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="#References">References</a>
<br /></li>
</ul>
<div id="Section-1"></div>
<h1 id="1-low-reource-knowledge-extraction">1. Low-reource Knowledge Extraction</h1>
<p><strong>Knowledge Extraction (KE)</strong> aims at extracting structural information from unstructured texts, such as <strong>Relation Extraction (RE)</strong> and <strong>Event Extraction (EE)</strong>.
For instance, as seen in <a href="#figure-anchor-1">Figure 1</a>, given a sentence “<em>Jack is married to the Iraqi microbiologist known as Dr. Germ.</em>”,
<br /></p>
<ul>
  <li>RE task should identify the relationship of the given entity pair &lt;Jack, Dr. Germ&gt; as ‘<em>husband_of</em>’;</li>
  <li>EE task should identify the event type as ‘<em>Marry</em>’ where the word ‘married’ triggers the event and (Jack, Dr. Germ) are participants in the event as husband and wife respectively. 
<br /></li>
</ul>
<center>
<img src="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/images/2021-12-01-prototypical-representation-learning-for-low-resource-knowledge-extraction-summary-and-perspective/ke.png" width="65%" alt="Knowledge Extraction" title="Figure 1. Knowledge Extraction" />
<div id="figure-anchor-1">Figure 1. Knowledge Extraction</div>
</center>
<p><br />
As most KE models assume sufficient training corpus which are indispensable when learning versatile vectors for relations and events, it is difficult for relations or events which have extremely limited instances to achieve satisfactory performance, just as <a href="#figure-anchor-2">Figure 2</a><a href="#refer-anchor-17"><sup>[17]</sup></a> shows. Hence it is crucial for KE models to be capable of extracting knowledge with <strong>low-resource</strong> training instances, including in <em>long-tail</em>, <em>few-shot</em>, and <em>zero-shot</em> settings. 
<br /></p>
<center>
<img src="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/images/2021-12-01-prototypical-representation-learning-for-low-resource-knowledge-extraction-summary-and-perspective/low-resource-ke.png" width="75%" alt="Low-resource Scenarios in Knowledge Extraction" title="Figure 2. Low-resource Scenarios in Knowledge Extraction" />
<div id="figure-anchor-2">Figure 2. Low-resource Scenarios in Knowledge Extraction</div>
</center>
<p><br />
In previous studies, there have been some resolutions for <strong><em>low-resource knowledge-extraction</em></strong>. In this blog, we focus on methods based on <strong>prototypical representation learning</strong>, which is proven to be robust in low-resource scenarios, derived from the ICLR paper “<em>Prototypical Representation Learning for Relation Extraction</em><a href="#refer-anchor-2"><sup>[2]</sup></a>”. 
<br /></p>
<div id="Section-2"></div>
<h1 id="2-prototypical-representation-learning">2. Prototypical Representation Learning</h1>
<p>In vanilla prototypical networks<a href="#refer-anchor-1"><sup>[1]</sup></a>, a class is represented by averaging the embeddings of its instances, and the class embedding is deemed as a class prototype (or called centroid), as <a href="#figure-anchor-3">Figure 3(a)</a> shows. Then, by calculating the distance from the query instance embedding to each prototype, we can classify the instance with the closest prototype. 
<br />
In recent years, various extension of vanilla prototypical representation learning have emerged in endlessly. 
We summarize these approaches into three categories: 
<br /></p>
<ul>
  <li>(1) <em>Intra-prototype Learning</em>;</li>
  <li>(2) <em>Inter-prototype Learning</em>;</li>
  <li>(3) <em>Joint Intra- and Inter- Prototype Learning</em>.</li>
</ul>
<div></div>
<p>The ICLR2021 paper “<em>Prototypical Representation Learning for Relation Extraction</em><a href="#refer-anchor-2"><sup>[2]</sup></a>”, introduced in this blog, is based on <em>Joint Intra- and Inter- Prototype Learning</em>. Then, we will outline the three sorts of methods one by one.</p>
<div id="Section-2.1"></div>
<h2 id="21-intra-prototype-learning">2.1 Intra-prototype Learning</h2>
<p><a href="#figure-anchor-3">Figure 3</a> illustrates the core idea of intra-prototype learning, with comparison to vanilla prototypical networks. 
Different from vanilla prototype learning briefly averaging instance embeddings for each class, intra-prototype learning aims to achieve more robust prototypes by 
<br /></p>
<ul>
  <li>(1) improving the representions of instance embeddings (intra-instance),</li>
  <li>(2) attentively aggregating the representions of instance embeddings (inter-instance),</li>
  <li>(3) highlighting both of the crucial features and instances (joint intra- and inter- instance). 
<br /></li>
</ul>
<center>
<img src="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/images/2021-12-01-prototypical-representation-learning-for-low-resource-knowledge-extraction-summary-and-perspective/inra-prototype.png" width="100%" alt="Illustration of Intra-prototype Learning" title="Figure 3. Illustration of Intra-prototype Learning" />
<div id="figure-anchor-3">Figure 3. Illustration of Intra-prototype Learning</div>
</center>
<div id="Section-2.1.1"></div>
<h3 id="intra-instance-on-feature-level">intra-instance: on feature-level</h3>
<p>Intra-instance learning methods for intra-prototype learning is to achieve more robust instance embeddings, as illustrated in <a href="#figure-anchor-3">Figure 3(b)</a>.
<br />
Fan et al.<a href="#refer-anchor-3"><sup>[3]</sup></a> consider recognized entities of interest to generate fine-grained features for instance embedding in few-shot relation classification, and adopt large-margin learning to increase the generalization ability of prototypical networks on recognizing long-tail relations. 
Wang et al.<a href="#refer-anchor-4"><sup>[4]</sup></a> focus on trigger biases (trigger overlapping and trigger separability) in few-shot event classification, and have proposed to tackle the context-bypassing problems with trigger-uniform sampling and confusion sampling.</p>
<div id="Section-2.1.2"></div>
<h3 id="inter-instance-on-sentence-level">inter-instance: on sentence-level</h3>
<p>Inter-instance learning methods for intra-prototype learning is to attentively aggregate instance embeddings, not merely equally averaging, as illustrated in <a href="#figure-anchor-3">Figure 3(c)</a>.
<br />
Ye et al.<a href="#refer-anchor-5"><sup>[5]</sup></a> have proposed multi-level matching and aggregation strategies for few-shot relation classification, where the class prototype are formed by attention-based instance matching and attentively aggregation. 
Lai V et al.<a href="#refer-anchor-6"><sup>[6]</sup></a> have proposed to exploit the relationship between training tasks for few-shot event detection, where prototypes are computed based on cross-task modeling.</p>
<div id="Section-2.1.3"></div>
<h3 id="joint-intra--and-inter--instance">joint intra- and inter- instance</h3>
<p>Some evolutionary prototipical networks integrate intra- and inter- instance learning for low-resource KE, such as:
<br />
Gao et al.<a href="#refer-anchor-7"><sup>[7]</sup></a> have improved vanilla prototypical networks with hybrid attention for few-shot relation extraction, w.r.t feature-level attention for instances and instance-level attention for prototypes.
Deng et al.<a href="#refer-anchor-8"><sup>[8]</sup></a> have utilized dynamic memory modules to enchance prototype learning for few-shot event detection, via implicitly highlighting crucial features for instances and refining instance embeddings for each prototype. 
<br /></p>
<div id="Section-2.2"></div>
<h2 id="22-inter-prototype-learning">2.2 Inter-prototype Learning</h2>
<p><a href="#figure-anchor-4">Figure 4</a> illustrates the core idea of inter-prototype learning, with comparison to vanilla prototypical networks. 
<br /></p>
<center>
<img src="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/images/2021-12-01-prototypical-representation-learning-for-low-resource-knowledge-extraction-summary-and-perspective/inter-prototype.png" width="100%" alt="Illustration of Inter-prototype Learning" title="Figure 4. Illustration of Inter-prototype Learning" />
<div id="figure-anchor-4">Figure 4. Illustration of Inter-prototype Learning</div>
</center>
<div id="Section-2.2.1"></div>
<h3 id="considering-long-tail-distribution-of--prototypes">considering long-tail distribution of  prototypes</h3>
<p>Cao et al.<a href="#refer-anchor-9"><sup>[9]</sup></a> have proposed to facilitate long-tail relation extraction by transferring knowledge from the relation prototypes with sufficient training instances, where relation prototypes reflect the meanings of relations as well as their proximities for transfer learning. 
<br /></p>
<div id="Section-2.2.2"></div>
<h3 id="considering-label-dependency-of-prototypes">considering label dependency of prototypes</h3>
<p>Cong et al.<a href="#refer-anchor-10"><sup>[10]</sup></a> have proposed a prototypical amortized conditional random field to model the label dependency in few-shot event detection, by generating the transition scores to achieve adaptation ability for novel event types based on the label prototypes. 
<br /></p>
<div id="Section-2.2.3"></div>
<h3 id="considering-knowledge-constraint-of--prototypes">considering knowledge constraint of  prototypes</h3>
<p>Yu et al.<a href="#refer-anchor-11"><sup>[11]</sup></a> have studied the few-shot relational triple extraction problem and proposed a multi-prototype embedding network that implicitly injected correlations between entities and relations, so that relations linked with the same entity type can be jointly learned. For example, the type of head entity must be PERSON for both <em>born_in</em> and <em>live_in</em> relation. 
<br /></p>
<div id="Section-2.3"></div>
<h2 id="23-joint-intra--and-inter--prototype-learning">2.3 Joint Intra- and Inter- Prototype Learning</h2>
<p>Ding et al.<a href="#refer-anchor-2"><sup>[2]</sup></a> have proposed to represent prototypes with integrating the advantages of intra- and inter- prototype learning. 
Ding et al. have learned prototypes for each relation in few-shot RE considering <strong>intra-prototype compactness</strong> and <strong>inter-prototype separability</strong> with contrastive learning in <strong>geometric space</strong>. 
<br />
Assuming that $s$ denotes instance embedding generated by an instance encoder, a prototype $z$ for relation $r$ is an embedding in the same metric space with $s$. Given $\mathcal{S} = [s_1, …, s_N]$ as the set of all instance embeddings in the batch $\mathcal{B} = [(s_1, r_1), …, (s_N, r_N)]$, and a fixed prototype $z^r$ for relation $r$, Ding et al. denote $\mathcal{S}^r$ the subset of all instances $s_i \in \mathcal{S}$ with relation $r$, $\mathcal{S}^{-r}$ the set of the rest instances, and $\mathcal{Z}^{-r}$ the set of prototypes $z’$ for all other relations except $r$. 
<br />
<strong>Intra-prototype compactness</strong> means that for a specific relation $r$, the ‘‘distance’’ between $z^r$ and any instances with the same relation $r$ should be less than the ‘‘distance’’ between $z^r$ and any instances with relations $r’ \neq r$. 
<br />
<strong>Inter-prototype separability</strong> means that the ‘‘distance’’ between $z^r$ and any instances with relation $r$ should be less than the ‘‘distance’’ between any prototypes $z’ \in {\mathcal{Z}^{-r}}$ and instances with relation $r$. 
<br />
<strong>Geometric space</strong>: Different from vanilla prototypical networks, Ding et al. have interpreted prototypes into geometric space, where a prototype is a unit vector starting from the origin and ending at the surface of a unit ball, and instances for that prototypes are unit vectors with approximately same directions centering at the prototype. Under the optimal condition, different prototype vectors would be uniformly dispersed with the angles between them as large as possible, as illustrated in <a href="#figure-anchor-5">Figure 5</a>. 
<br /></p>
<center>
<img src="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/images/2021-12-01-prototypical-representation-learning-for-low-resource-knowledge-extraction-summary-and-perspective/ICLR-geo.pdf" width="40%" alt="Geometrical Contrastive Reprensentaion of Prototypical Learning" title="Figure 5. Geometrical Contrastive Reprensentaion of Prototypical Learning" />
<div id="figure-anchor-5">Figure 5. Geometrical Contrastive Reprensentaion of Prototypical Learning</div>
</center>
<p><br />
<em>Geometrical contrastive reprensentaion of prototypical learning considering intra-prototype compactness and inter-prototype separability</em> are based on 
<br /></p>
<ul>
  <li><strong>instance-level</strong>: intance-instance contrastive learning,  and</li>
  <li><strong>instance-prototype level</strong>: instance-prototype contrastive learning. 
<br /></li>
</ul>
<div id="Section-2.3.1"></div>
<h3 id="learning-on-instance-level">Learning on instance-level</h3>
<p>Given a batch $\mathcal{B} = [(s_1, r_1), …, (s_N, r_N)]$ of instance-relation pairs, the similarity metric between two instance embeddings $d(s_i, s_j)$ is defined by:</p>
<center>
<div id="eq:dist_s2s"></div>
$$
d(s_i, s_j) = 1 / (1 + \exp(\frac{s_i}{||s_i||} \cdot \frac{s_j}{||s_j||})). 
$$
</center>
<p>Geometrically, as illustrated in <a href="#figure-anchor-6">Figure 6</a>,  this metric is based on the <em>angles of the normalized embeddings restricted in a unit ball</em>, and similarity metric between a instance embedding and a prototype <a href="#eq:dist_s2z">$d(z, s)$</a> follow the same principle.</p>
<center>
<img src="https://iclr.iro.umontreal.ca/b985c766-a078-4a46-898d-0d9bae9707e8_1642247411/public/images/2021-12-01-prototypical-representation-learning-for-low-resource-knowledge-extraction-summary-and-perspective/ICLR-model.pdf" width="50%" alt="Similarity Metric for Geometrical Prototypical Reprensentaion Learning" title="Figure 6. Similarity Metric for Geometrical Prototypical Reprensentaion Learning" />
<div id="figure-anchor-6">Figure 6. Similarity Metric for Geometrical Prototypical Reprensentaion Learning</div>
</center>
<p><br />
In order to ensure <em>instance-level</em> <em>intra-prototype compactness</em> and <em>inter-prototype separability</em> in the representation space, Ding et al. have defined a contrastive objective function $\mathcal{L}_{\text{S2S}}$ between instance embeddings, denoted by:</p>
<center>
<div id="eq:ls2s"></div>
$$
   \mathcal{L}_{\text{S2S}} = -\frac{1}{N^2} \sum_{i, j} \frac{\exp (\delta(s_i, s_j) d(s_i, s_j))}{\sum_{j'} \exp((1 - \delta(s_i, s_{j'}))d(s_i, s_{j'}))},
$$
</center>
<p>where $\delta(s_i, s_j)$ denotes if $s_i$ and $s_j$ corresponds to the same relation, i.e., given $(s_i, r_i), (s_j, r_j)$, $\delta(s_i, s_j) = 1 \; \text{if}\; r_i = r_j \; \text{else} \; 0$. 
<br /></p>
<div id="Section-2.3.2"></div>
<h3 id="learning-on-instance-prototype-level">Learning on Instance-prototype level</h3>
<p>Denoting that $\mathcal{S}^r$ is the subset of all instances $s_i$ in $\mathcal{S}$ with relation $r$, $\mathcal{S}^{-r}$ is the set of the rest instances, and $\mathcal{Z}^{-r}$ is the set of prototypes $z’$ for all other relations except $r$. The similarity metric between a instance embedding and a prototype $d(z, s)$ (illustrated in <a href="#figure-anchor-5">Figure 5</a>) is defined by:</p>
<center>
<div id="eq:dist_s2z"></div>
$$
d(z, s) = 1/ (1 + \exp(\frac{s }{||s||} \cdot \frac{z}{||z||})). 
$$ 
</center>
<p>To realize <em>intra-prototype compactness</em> between instances and prototypes, Ding et al. have defined an objective function $\mathcal{L}_{\text{S2Z}}$:</p>
<center>
<div id="eq:ls2z"></div>
$$
\mathcal{L}_{\text{S2Z}} = -\frac{1}{N^2} \sum_{s_i \in \mathcal{S}^{r}, s_j \in \mathcal{S}^{-r}} \big[\log d(z^r, s_i) + \log(1 - d(z^r, s_j))\big]. 
$$
</center>
<p>To realize <em>inter-prototype separability</em> between instances and prototypes, Ding et al. have defined an objective function $\mathcal{L}_{\text{S2Z’}}$:</p>
<center>
<div id="eq:ls2z_"></div>
$$
\mathcal{L}_{\text{S2Z'}} = -\frac{1}{N^2} \sum_{s_i \in \mathcal{S}^{r}, z' \in \mathcal{Z}^{-r}} \big[\log d(z^r, s_i) + \log(1 - d(z', s_i))\big]. 
$$
</center>
<p>These objectives can effectively <em>split the data representations into $K$ disjoint manifolds centering at different prototypes</em>. 
<br />
<strong>Comparison Analysis of Loss Functions:</strong> 
Comparing with the conventional cross-entropy loss in prototypical learning, $\mathcal{L}_{\text{S2Z}}$ and $\mathcal{L}_{\text{S2Z’}}$ demonstrate great advantages:</p>
<ul>
  <li>Cross-entropy loss: solely relies on the <em>instance level</em> supervision, and there is no interactions between different instances, which is particularly noisy under a noisy-label setting,</li>
  <li>$\mathcal{L}_{\text{S2Z}}$ and $\mathcal{L}_{\text{S2Z’}}$: consider distances between different instances and prototypes, which <em>exploits the interactions between instances</em>. 
This type of interaction would effectively serve as a regularization to the decision boundary.</li>
</ul>
<div></div>
<p>To further regularize the semantics of the prototypes, Ding et al. also use a prototype-level classification objective:</p>
<center>
<div id="eq:cls"></div>
$$
    \mathcal{L}_{\text{CLS}} = \frac{1}{K} \sum_k \log p_\gamma(r^k | z^k),
$$
</center>
<p>where $\gamma$ denotes the parameters of an auxiliary classifier. Finally, with hyper-parameters $\lambda_1$, $\lambda_2$ and $\lambda_3$, the full loss is defined as:</p>
<center>
<div id="eq:loss_all"></div>
$$
    \mathcal{L} = \lambda_1 \mathcal{L}_{\text{S2S}} + \lambda_2 (\mathcal{L}_{\text{S2Z}} + \mathcal{L}_{\text{S2Z'}})  + \lambda_3 \mathcal{L}_{\text{CLS}}. 
$$
</center>
<div id="Section-3"></div>
<h1 id="3-promising-research-directions">3. Promising Research Directions</h1>
<p>Although Ding et al. have produced predictive and robust representations over prototypes with jointly intra- and inter prototype learning,  they merely focus on contrast among instances and prototypes, consequently may ignore <strong>the inherent semantic correlation among prototypes</strong>, such as hierarchy and entailment. Besides, another <strong>task-specific modeling space</strong> may also enhance prototypical representation learning for low-resource KE. Therefore, we also discuss some promising research directions. 
<br /></p>
<div id="Section-3.1"></div>
<h2 id="31-knowledge-enhanced-prototype-learning">3.1 Knowledge-enhanced Prototype Learning</h2>
<p><strong>We argue that injecting inherent semantics among instances and classes may also promote prototypical learning, such as concept-level and class-level knowledge.</strong></p>
<div id="Section-3.1.1"></div>
<h3 id="injecting-concept-level-knowledge">injecting concept-level knowledge</h3>
<p>Gong et al.<a href="#refer-anchor-12"><sup>[12]</sup></a> have improved prototypical networks with <em>side information</em>, which are built from keywords, hypernyms of name entities, and labels and their synonyms, and Zhang et al.<a href="#refer-anchor-13"><sup>[13]</sup></a> also have imposed <em>concept-level KGs</em> to better capture semantics of low-resource relation types, demonstrating effectiveness and robustness in zero-shot and few-shot relation classification. 
<br /></p>
<div id="Section-3.1.2"></div>
<h3 id="injecting-class-level-knowledge">injecting class-level knowledge</h3>
<p>Zheng et al.<a href="#refer-anchor-14"><sup>[14]</sup></a> have proposed a taxonomy-aware prototypical learning framework to model the <em>hierarchy of event types</em> in few-shot event detection, considering the problems of class centroids distribution and taxonomy-aware distribution in vanilla prototypical networks. In addition to hierarchy, Deng et al.<a href="#refer-anchor-16"><sup>[16]</sup></a> have injected <em>temporality and causality of event types</em>. 
<br /></p>
<div id="Section-3.2"></div>
<h2 id="32--geometrical-prototype-learning">3.2  Geometrical Prototype Learning</h2>
<p><strong>We also argue that modeling prototypes in non-Euclidean space may encourage to acquire complicated semantics, such as taxonomy in hyperbolic space, and class correlation in hyperspherical space.</strong></p>
<div id="Section-3.2.1"></div>
<h3 id="in-hyperbolic-space">in hyperbolic space</h3>
<p>Zheng et al.<a href="#refer-anchor-14"><sup>[14]</sup></a> have projected the event label taxonomy to the hyperbolic space based on  Poincaré model<a href="#refer-anchor-15"><sup>[15]</sup></a> which outperforms Euclidean embeddings significantly on data with latent hierarchies, in order to obtain the label hierarchy embedding for each event type, and integrated each prototype vector with taxonomy-aware label embedding.</p>
<div id="Section-3.2.2"></div>
<h3 id="in-hyperspherical-space">in hyperspherical space</h3>
<p>Deng et al.<a href="#refer-anchor-17"><sup>[17]</sup></a> have leveraged a knowledge-aware hyperspherical prototype network to model entailment correlation among relations and causality among events, as hyperspherical prototype networks<a href="#refer-anchor-18"><sup>[18]</sup></a> have demonstrated effectiveness of imposing class semantics. We think that more class semantics can be futher explored, such as temporal and inverse correlations.
<br /></p>
<div id="References"></div>
<h1 id="references">References</h1>
<div id="refer-anchor-1"></div>
<p>[1] <a href="https://proceedings.neurips.cc/paper/2017/hash/cb8da6767461f2812ae4290eac7cbc42-Abstract.html">Prototypical Networks for Few-shot Learning</a> (NeurIPS 2017)
<br /></p>
<div id="refer-anchor-2"></div>
<p>[2] <a href="https://openreview.net/forum?id=aCgLmfhIy_f">Prototypical Representation Learning for Relation Extraction</a> (ICLR 2021)
<br /></p>
<div id="refer-anchor-3"></div>
<p>[3] <a href="https://dl.acm.org/doi/10.1145/3357384.3358100">Large Margin Prototypical Network for Few-shot Relation Classification with Fine-grained Features</a> (CIKM 2019)
<br /></p>
<div id="refer-anchor-4"></div>
<p>[4] <a href="https://dl.acm.org/doi/10.1145/3459637.3482236">Behind the Scenes: An Exploration of Trigger Biases Problem in Few-Shot Event Classification</a> (CIKM 2021)
<br /></p>
<div id="refer-anchor-5"></div>
<p>[5] <a href="https://aclanthology.org/P19-1277.pdf">Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification</a> (ACL 2019)
<br /></p>
<div id="refer-anchor-6"></div>
<p>[6] <a href="https://aclanthology.org/2021.emnlp-main.427.pdf">Learning Prototype Representations Across Few-Shot Tasks for Event Detection</a> (EMNLP 2021)
<br /></p>
<div id="refer-anchor-7"></div>
<p>[7] <a href="https://ojs.aaai.org//index.php/AAAI/article/view/4604">Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification</a> (AAAI 2019)
<br /></p>
<div id="refer-anchor-8"></div>
<p>[8] <a href="https://doi.org/10.1145/3336191.3371796">Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection</a> (WSDM 2020)
<br /></p>
<div id="refer-anchor-9"></div>
<p>[9] <a href="https://www.computer.org/csdl/journal/tk/5555/01/09483677/1vcJo4x5s6Q">Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction</a> (TKDE, 2021)
<br /></p>
<div id="refer-anchor-10"></div>
<p>[10] <a href="https://aclanthology.org/2021.findings-acl.3.pdf">Few-Shot Event Detection with Prototypical Amortized Conditional Random Field</a> (ACL 2021)
<br /></p>
<div id="refer-anchor-11"></div>
<p>[11] <a href="https://aclanthology.org/2020.coling-main.563/">Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction</a> (COLING 2020)
<br /></p>
<div id="refer-anchor-12"></div>
<p>[12] <a href="https://dl.acm.org/doi/abs/10.1145/3459637.3482403">Zero-shot Relation Classification from Side Information</a> (CIKM 2021)
<br /></p>
<div id="refer-anchor-13"></div>
<p>[13] <a href="https://dl.acm.org/doi/abs/10.1145/3447548.3467438">Knowledge-Enhanced Domain Adaptation in Few-Shot Relation Classification</a> (KDD 2021)
<br /></p>
<div id="refer-anchor-14"></div>
<p>[14] <a href="https://dl.acm.org/doi/10.1145/3442381.3449949">Taxonomy-aware Learning for Few-Shot Event Detection</a> (WWW 2021)
<br /></p>
<div id="refer-anchor-15"></div>
<p>[15] <a href="https://proceedings.neurips.cc/paper/2017/file/59dfa2df42d9e3d41f5b02bfc32229dd-Paper.pdf">Poincaré Embeddings for Learning Hierarchical Representations</a> (NeurIPS 2017)
<br /></p>
<div id="refer-anchor-16"></div>
<p>[16] <a href="https://doi.org/10.18653/v1/2021.acl-long.220">OntoED: Low-resource Event Detection with Ontology Embedding</a> (ACL 2021)
<br /></p>
<div id="refer-anchor-17"></div>
<p>[17] <a href="https://www.sciencedirect.com/science/article/pii/S0950705121008467">Low-resource Extraction with Knowledge-aware Pairwise Prototype Learning</a> (Knowledge-Based Systems, 2022)
<br /></p>
<div id="refer-anchor-18"></div>
<p>[18] <a href="https://proceedings.neurips.cc/paper/2019/file/02a32ad2669e6fe298e607fe7cc0e1a0-Paper.pdf">Hyperspherical Prototype Networks</a> (NeurIPS 2019)</p>

</div>

<div id="bibtex-container" class="related">
  For attribution in academic contexts, please cite this work as
  <pre id="bibtex-academic-attribution">

  </pre>

  BibTeX citation
  <pre id="bibtex-box">

  </pre>
</div>
<script>
  let authorsSpan = document.getElementById("iclr-post-authors");
  let authorsText = authorsSpan.textContent;
  let lnameFnameInstitution = authorsText.split(";");
  let lfiList = lnameFnameInstitution.map(lfi => lfi.split(",").map(item => item.trim()));
  let bibtexLFI = lfiList.map(lfi => lfi[0] + ", " + lfi[1]).join(" and ")
  let academicLFI = lfiList.map(lfi => lfi[0]);
  {
    if(academicLFI.length > 2) academicLFI = academicLFI[0] + ", et al.";
    else if(academicLFI.length == 2) academicLFI = academicLFI[0] + " & " + academicLFI[1];
    else academicLFI = academicLFI[0];
  }

  let titleSpan = document.getElementById("iclr-post-title");
  let titleText = titleSpan.textContent.trim();
  let bibtexTitleShorthand = (lfiList[0][1]+
    "2022"+
    titleText.split(" ").slice(0, 3).join("")
  ).replace(" ", "").replace(/[\p{P}$+<=>^`|~]/gu, '').toLowerCase().trim();

  let bibtexTemplate = `
@inproceedings{${bibtexTitleShorthand}},
  author = {${bibtexLFI}},
  title = {${titleText}},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {${window.location.href}},
  url  = {${window.location.href}}
}
  `.trim();
  document.getElementById("bibtex-box").innerText = bibtexTemplate;

  let academicTemplate = `
${academicLFI}, "${titleText}", ICLR Blog Track, 2022.
`.trim();
  document.getElementById("bibtex-academic-attribution").innerText = academicTemplate;

</script>


<div class="related">
  <h2>Related posts</h2>
  <ul class="related-posts">
    
      <li>
        <h3>
          <a href="/2021/09/01/sample-submission/">
            Sample Submission
            <small>01 Sep 2021 | 
    <a class="content-tag" href="/tags/#prototype"> Prototype </a>
  
    <a class="content-tag" href="/tags/#low-resource"> Low-resource </a>
  
    <a class="content-tag" href="/tags/#knowledge-extraction"> Knowledge Extraction </a>
  </small>
          </a>
        </h3>
      </li>
    
      <li>
        <h3>
          <a href="/2020/04/02/example-content/">
            Example content (Basic Markdown)
            <small>02 Apr 2020 | 
    <a class="content-tag" href="/tags/#prototype"> Prototype </a>
  
    <a class="content-tag" href="/tags/#low-resource"> Low-resource </a>
  
    <a class="content-tag" href="/tags/#knowledge-extraction"> Knowledge Extraction </a>
  </small>
          </a>
        </h3>
      </li>
    
  </ul>
</div>


<script src="https://utteranc.es/client.js"
        repo="iclr-blog-track/iclr-blog-track.github.io"
        issue-term="pathname"
        label="utterance"
        theme="boxy-light"
        crossorigin="anonymous"
        >
</script>


      </div>
    </div>

    <label for="sidebar-checkbox" class="sidebar-toggle"></label>

    <script src='/public/js/script.js'></script>
  </body>
</html>
