<!DOCTYPE html>
<html lang="en-us">

  <head>
  <link href="http://gmpg.org/xfn/11" rel="profile">
  <meta http-equiv="content-type" content="text/html; charset=utf-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">

  <title>
    
      Prior Knowledge for Few-shot Learning—Inductive Reasoning and Distribution Calibration &middot; The ICLR Blog Track
    
  </title>

  
  <link rel="canonical" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/2021/12/01/Prior-Knowledge-for-Few-shot-Learning/">
  

  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/css/poole.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/css/syntax.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/css/lanyon.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/css/custom.css">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">

  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/apple-touch-icon-precomposed.png">
  <link rel="shortcut icon" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/favicon.ico">

  <link rel="alternate" type="application/rss+xml" title="RSS" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/atom.xml">

  

  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript" ></script>
 <!-- <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> -->
  <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
         processEscapes: false
        }
      });
</script>
</head>


  <body>

    <!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
     styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
<!-- <input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" > -->

<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
  <div class="sidebar-item">
    <p>For short-term, peer-sourced tests of time, generalizations, specializations, reproductions, etc.!</p>
  </div>

  <nav class="sidebar-nav">

    

    
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/">ICLR 2022 Blog Track</a>
        
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/about/">About</a>
        
      
    
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/submitting/">Submitting</a>
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/tags/">Tags</a>
        
      
    

    <a class="sidebar-nav-item" href="https://github.com/iclr-blog-track/iclr-blog-track.github.io">GitHub project</a>
    <span class="sidebar-nav-item">Currently vICLR Spring 2021</span>
  </nav>

  <div class="sidebar-item">
    <p>
      &copy; 2022. All rights reserved.
    </p>
  </div>
</div>


    <!-- Wrap is the content to shift when toggling the sidebar. We wrap the
         content to avoid any CSS collisions with our real content. -->
    <div class="wrap">
      <div class="masthead">
        <div class="container">
          <h3 class="masthead-title">
            <a href="/" title="Home">The ICLR Blog Track</a>
            <small></small>
          </h3>
        </div>
      </div>

      <div class="container content">
        <div class="post">
  <h1 id="iclr-post-title" class="post-title">Prior Knowledge for Few-shot Learning—Inductive Reasoning and Distribution Calibration</h1>
  <span class="post-date">01 Dec 2021 | 
    <a class="content-tag" href="/tags/#few-shot-learning"> few-shot learning </a>
  
    <a class="content-tag" href="/tags/#prior-knowledge"> prior knowledge </a>
  
    <a class="content-tag" href="/tags/#distribution-estimation"> distribution estimation </a>
  
    <a class="content-tag" href="/tags/#bias-correction"> bias correction </a>
  </span>

  <span id="iclr-post-authors" class="post-date">Anonymous</span>
  <p>Few-shot learning is an important technique that can improve the learning capabilities of machine intelligence and practical adaptive applications. Previous researchers apply the meta-learning strategy to endow the new model with the ability or leverage transfer learning to alleviate the challenge of data-hungry. Moreover, prior knowledge such as knowledge graphs can also be modeled under the few-shot setting. This post gives an overview of recent works about how prior knowledge can address the problem of few-shot learning, and discusses a simple and efficient few-shot learning approach that estimates the novel class distributions derived inductively from the base classes.</p>

<h2 id="introduction">Introduction</h2>

<p>Humans can adapt to a novel task from only a few observations, because our brains have excellent capability of learning to learn. In contrast, modern artificial intelligence (AI) systems generally require a large amount of annotated samples to make the adaptations. Few shot learning (FSL) becomes an important and widely studied problem. Different from conventional machine learning, FSL aims to learn prior knowledge on base classes with large amounts of labeled data and utilize the knowledge to recognize few-shot classes with scarce labeled data.</p>

<p>Exisiting studies on FSL roughly fall into two categories, namely the metric-based learning and optimization-based methods. The common methodology of metric-based learning algorithms is that classifying test samples by matching them to the nearest class prototype. However, training on the few labeled samples may get a <strong>biased distribution</strong> (or biased prototype), specially in one-shot learning scenario. As show in Figure 1,  the given few labeled samples may be far away from its ground-truth centers in the case of large variances for novel classes. Hence,  this is a meaningful question that <strong>how to estimate representative prototypes from the few labeled samples</strong>.</p>

<div align="center">
    <img src="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/images/2021-12-01-Prior-Knowledge-for-Few-shot-Learning/distribution.png" alt="Distribution" style="zoom:30%;" />
    <br />
    <div>Figure 1. The distribution of base and novel class samples in the pretrained feature space.</div>
</div>

<h2 id="review-the-past">Review the Past</h2>

<p>Prior knowledge is a essential element to alleviate the problem of having an <strong>unreliable empirical risk minimizer</strong> in the FSL supervised learning.</p>

<h3 id="few-shot-learning">Few-Shot Learning</h3>

<p>Existing FSL works can be categorized into the following perspectives: <em>Data</em>, <em>Model</em> and <em>Algorithm</em>. Moreover, the complementation of these three perspective could be more robust to solve the  problem of FSL, which have been demonstrated in recent works.</p>

<h4 id="data">Data</h4>

<p><strong>Augment on Training Dataset</strong> Data augmentation on original training dataset via hand-crafted rules is usually used as pre-processing in FSL methods.  For example, on NLP domain, one can use <em>back-translation</em>, <em>word replacement</em>, <em>cutoff</em>, and <em>adversial training.</em>  In addition, several strategies that combine different augmentation methods have been proposed, such as applying multiple transformations sequentially  <a href="#refer-7"><sup>[7]</sup></a> . However, <strong>the capacity of data augment is limited to solve the FSL problem.</strong> For example, the existed approaches may be specific to one domain, making them hard to be applied to other domains.</p>

<div align="center">
    <img src="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/images/2021-12-01-Prior-Knowledge-for-Few-shot-Learning/core.png" alt="result" style="zoom:75%;" />
    <br />
    <div>Figure 3. A taxonomy of FSL methods.</div>
</div>
<p><strong>Utilizing Weakly Labeled or Unlabeled Data</strong> How to sufficiently leverage weakly labeled or unlabeled data along with limited labeled data (i.e., semi-supervised learning, SSL)  <a href="#refer-8"><sup>[8]</sup></a>  is always hot topic in the field of machine learning, which could be mainly categorized as <em>self-training</em> and <em>consistency regularization</em> . However,  it has been shown that a necessary condition,  <em>labeled data and unlabeled data with pseudo label come from the same distribution during the training process</em>, is hard to hold in real application. Thus, a popular idea <a href="#refer-9"><sup>[9]</sup></a> <a href="#refer-10"><sup>[10]</sup></a> a proposed in recent works  to select a subset of training examples from unlabeled examples for SSL.</p>

<p><strong>Transforming Samples from Similar Datasets</strong> This strategy augments training data by aggregating and adapting input-output pairs from a similar but larger data sets. The aggregation weight is usually based on some similarity measure between samples. Moreover, some pre-defined knowledge could take the place of similarity measure to build the reliable correlation among samples. For example, OntoED <a href="#refer-16"><sup>[16]</sup></a>  leveraged entity ontology to establish linkages between new unseen event types and existing ones, which is more robust than previous approaches to ED, especially in few-shot scenarios.</p>

<h4 id="model">Model</h4>

<p><strong>Multitask Learning</strong>  In the presence of multiple related tasks, multitask learning learns these tasks simultaneously by exploiting both task-generic and task-specific information. The core issue of the multitask learning appied to the FSL is the design of related task composition, which depends heavily on domain knowledge.</p>

<p><strong>Metric Learning</strong> For the FSL problem, researchers proposed simple but effective algorithms based on metric learning. For example, MatchingNet and ProtoNet learned to classify samples by comparing the distance to the representatives of each class.</p>

<p><strong>Generative Model</strong> Generation model is often design to compensate for the insufficient number of available samples by generation. Most methods use the idea of Generative Adversarial Networks (GANs) or autoencoder to generate samples or features to augment the training set. Specifically, Yoo et al. proposes a novel data augmentation technique that leverages large-scale language models (e.g., GPT-3) to generate realistic text samples from a mixture of real samples.</p>

<p><strong>Learning with External Memory</strong> Recently, constructing key-value memory extracted from train dataset, and infering the examples by retrieving the memory database based on the similarity between query and key have shown some promising results. It is important that the whole process is non-parametric and requires no parameter update. The cache model has been adopted for improving language generation in kNN-LMs  <a href="#refer-13"><sup>[13]</sup></a> <a href="#refer-15"><sup>[15]</sup></a>. Moreover, Zhang et al. <a href="#refer-14"><sup>[14]</sup></a> explore it with CLIP and adopt the few-shot setting.</p>

<h4 id="algorithm">Algorithm</h4>

<p><strong>Refining Pretrained Model</strong> This strategy takes a pre-trained model learned from related tasks as a good initialization, and adapts it to a new task. The assumption is that captures some general structures of the large-scale data. Therefore, it can be adapted to a new task with limited labeled data in a few iterations.</p>

<p><strong>Learning Optimizer</strong>  One of the most general algorithms for meta-learning is the optimization-based algorithm. Finn et al.  <a href="#refer-11"><sup>[11]</sup></a> and Li et al.  <a href="#refer-12"><sup>[12]</sup></a> proposed to learn how to optimize the gradient descent procedure so that the learner can have a good initialization, update direction, and learning rate.</p>

<p>Recently, Yang et.al, <a href="#refer-1"><sup>[1]</sup></a> calibrated the distribution of these few-sample classes by transferring statistics from the classes with sufficient examples. Then an adequate number of examples can be sampled from the calibrated distribution as data augmentation technique. Their approach could achieve the state-of-the-art accuracy on three datasets (5% improvement on miniImageNet).</p>

<h2 id="distribution-calibration">Distribution Calibration</h2>

<h3 id="motivation">Motivation</h3>

<p>As pointed out above, the few samples may cause the biased distribution estimation, which can damage the generalization of ability of the model. The authors observed that the feature distribution of  similar classes usually exists the similar statistics (e.g., mean and variance), as shown in table 1. Meanwhile, the statistics can be estimated more accurately when there are adequate samples for this class. Based on these observations, the authors proposed that <strong>transfer the statistics of many-shot classes to estimate the distribution of the few-shot classes</strong> according to the similarity in the semantic similarity between classes.</p>

<div align="center">
  	<table style="text-align:center">
    <thead>
      <tr>
        <th>Arctic fox</th>
        <th>Mean sim</th>
        <th>Var sim</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>white wolf</td>
        <td>97%</td>
        <td>97%</td>
      </tr>
      <tr>
        <td>malamute</td>
        <td>85%</td>
        <td>78%</td>
      </tr>
      <tr>
        <td>lion</td>
        <td>81%</td>
        <td>70%</td>
      </tr>
      <tr>
        <td>meerkat</td>
        <td>78%</td>
        <td>70%</td>
      </tr>
    </tbody>
  </table>
  <div>Table 1. The class statistics similarity between Arctic fox and different classes.</div>
</div>

<h3 id="method">Method</h3>

<p>This work follows a typical few-shot classification setting. It is formed as N-way K-shot few-shot tasks where each task consists of N few-shot classes with K labeled samples per class (the support set) and some unlabeled samples (the query set) for test. The training procedure for an N-way-K-shot task is shown in Algorithm 1 below.</p>

<div align="center">
    <img src="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/images/2021-12-01-Prior-Knowledge-for-Few-shot-Learning/algorithm.png" alt="algorithm" style="zoom:30%;" />
</div>

<p>First, the authors assume the feature distribution of base classes is Gaussian. The mean and covarience of the feature vector from a base class $i$ are calculated as the mean and varience of every single dimension in the vector. Secondly, To make the feature distribution more Gaussian-like, a key step is transforming the features of the support set and query set in the target task using Tukey’s Ladder of Powers transformation.</p>

<p>During distribution calibration step, the transfer of statistics is based on the Euclidean distance between the feature space of the novel classes and the mean of the features from the base classes. Specially, top $k$ base classes will be selected to construct the calibrated distribution:
\(\boldsymbol{\mu}'=\frac{\sum_{i \in S_{N}}{\boldsymbol{\mu}_i+\tilde{\boldsymbol{x}}}}{k+1}, \boldsymbol{\Sigma}'=\frac{\sum_{i \in S_{N}}{\boldsymbol{\Sigma}_i}}{k}+\alpha\)</p>

<p>After obtaining the calibrated distribution, a sufficient feature vectors with label could be generated by sampling from the calibrated Gaussian distributions to train a classifier.</p>

<h3 id="analysis">Analysis</h3>

<p><strong>Strengths</strong> As shown in Table 2, simple linear classifier equipped with data cablibration method perform better than the state-of-the-art few-shot classification method and achieve the best performance on 1-shot and 5-shot settings of miniImageNet, tieredImageNet and CUB. Specifically, the performance of DC surpasses the state-of-the-art method by 10% for the 5way1shot setting, which proves that calibrating distribution can <strong>handle extremely low-shot classification tasks better</strong> through modeling the association between base classes and novel classes.</p>

<div align="center">
    <img src="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/images/2021-12-01-Prior-Knowledge-for-Few-shot-Learning/result.png" alt="result" style="zoom:30%;" />
    <br />
    <div>Table 2. Performance on miniImageNet and CUB.</div>
</div>

<p>Compared with those generative model to generate extra samples or features for training, this distribution calibration strategy is <strong>simple and dose not need extra learnable parameters</strong>. Note that in the Figure 2, the generated features sampled from calibrated distribution can overlap areas of the query set, which means that training with these generated features as data augment technique could improve the generalization ability.</p>

<div align="center">
    <img src="https://iclr.iro.umontreal.ca/ab6d1418-3e05-408c-9932-59f06aec75e0_1642247335/public/images/2021-12-01-Prior-Knowledge-for-Few-shot-Learning/TSNE.png" alt="TSNT" style="zoom:15%;" />
    <br />
    <div>Figure 2. t-SNE visualization of distribution estimation.</div>
</div>

<p><strong>Limitations</strong>  Aforementioned approach exists some limiting assumptions. First, this work relies on the reasonable distribution assumption (Gaussian in this work), which may exists the generality problem to other tasks. Second, this method implicitly assumed that the novel classes inevitably exist the association with the certain base classes (topK base classes), and did not consider the similarity strength between the base and novel classes when estimating novel class statistics. Recent work <a href="#refer-2"><sup>[2]</sup></a> also thinks that this method implicitly assumed that the base classes were semantically independent of each other when constructing covariance estimates.</p>

<h3 id="bias-correction">Bias Correction</h3>

<p>Bias correction is a idea worth refering to solve above-mentioned problem. The distribution calibration based on the many-shot samples is one of representative to rectify the distribution. The main methodologies of bias correction for few-shot learning could be summed up as the <strong>reconstruction-based</strong>, <strong>utilizing primitive knowledge</strong> and <strong>utilizing extra data</strong> methods.</p>

<ul>
  <li><strong>Reconstruction-based</strong>: It is a class of method <a href="#refer-3"><sup>[3]</sup></a><a href="#refer-6"><sup>[6]</sup></a> to construct a pair of noise (biased) prototype and target (representative) prototype and train a regression model to restore the prototype. Despite existing the necessity of designing complex model  , this method does not depend on extra data which may be not suitable for some scenarios.</li>
  <li><strong>Utilizing primitive knowledge</strong>: Recent works have demonstrated that rectifying prototype with primitive knowledge could achieve prominent improvement for few-shot learning. Zhang et al.  <a href="#refer-4"><sup>[4]</sup></a>  design a framework introduces WordNet (i.e., attribute annotations) as extra knowledge, extracts representative attribute features as priors and complete prototype with these priors.  In addition, for estimating more representative prototypes, Yu et. el. <a href="#refer-4"><sup>[4]</sup></a> regularize prototypes with explicit prior knowledge constraints in entity pairs and relations.</li>
  <li><strong>Utilizing extra data</strong>: Due to considering the many-shot data, aforementioned approach could be view as utilizing extra data. Another line of approaches is to leverage unlabeled samples, which exists some intersection with semi-supervised learning.</li>
</ul>

<h2 id="look-forward-to-the-future">Look Forward To the Future</h2>

<p><strong>Future = Large Model ?</strong> The recent GPT-3 model achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. And Gao et al. proposed a suite of simple and complementary techniques for fine- tuning language models on a small number of annotated examples, LM-BFF. However, the common problem with these works appied in the FSL are limited in specific task (e.g., simple text classification tasks). Large models pretrained on large data may have strong perceptive ability, but they have poor performance on those tasks involved with inductive reasoning.</p>

<p><strong>Future = Large Model + Knowledge ?</strong> From the perspective of bias correction, it is still difficult to design a efficient model to avoid the risk of overfitting the base tasks without any extra constraint condition (or inductive bias). It may be a possible direction that combining leveraging extra data (or knowledge) and model design to break the dilemma of few-shot learning. More and more researchers begin to explore incorporating prior knowledge within large model, which may boost the ability of fast adaption with limited experience.</p>

<h2 id="references">References</h2>

<div id="refer-1">[1] <a href="https://arxiv.org/abs/2101.06395">Free Lunch for Few-shot Learning: Distribution Calibration.</a> (ICLR 2021)</div>

<div id="refer-2">[2] <a href="https://openreview.net/pdf/7a51b475cd19009b69a48682906d91d6b3e8f146.pdf">Generalized Distribution Calibration for Few-Shot Learning.</a> (Preprint)</div>

<div id="refer-3">[3] <a href="https://arxiv.org/abs/2005.01234">One-Shot Image Classification by Learning to Restore Prototypes.</a> (AAAI 2020)</div>

<div id="refer-4">[4] <a href="https://arxiv.org/abs/2009.04960">Prototype Completion with Primitive Knowledge for Few-Shot Learning.</a> (CVPR 2021)</div>

<div id="refer-5">[5] <a href="https://arxiv.org/abs/1911.10713">Prototype Rectification for Few-Shot Learning.</a> (ECCV 2020)</div>

<div id="refer-6">[6] <a href="https://link.springer.com/chapter/10.1007/978-3-319-46466-4_37">Learning to Learn: Model Regression Networks for Easy Small Sample Learning.</a> (ECCV 2016)</div>

<div id="refer-7">[7] <a href="https://arxiv.org/abs/2010.08670">CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding.</a> (ICLR 2021)</div>

<div id="refer-8">[8] <a href="https://arxiv.org/abs/1905.02249">MixMatch: A Holistic Approach to Semi-Supervised Learning.</a> (NeurIPS 2019)</div>

<div id="refer-9">[9] <a href="https://arxiv.org/abs/2001.07685">FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence.</a> (NeurIPS 2020)</div>

<div id="refer-10">[10] <a href="https://proceedings.mlr.press/v139/xu21e.html">Dash: Semi-Supervised Learning with Dynamic Thresholding.</a> (NeuIPS 2021)</div>

<div id="refer-11">[11] <a href="https://arxiv.org/abs/1703.03400">Model-agnostic meta-learning for fast adaptation of deep networks.</a> (ICML 2017)</div>

<div id="refer-12">[12] <a href="https://arxiv.org/abs/1707.09835">Meta-sgd: Learning to learn quickly for few shot learning.</a> (Preprint)</div>

<div id="refer-13">[13] <a href="https://arxiv.org/abs/1911.00172">Generalization through memorization: Nearest neighbor language models.</a> (ICLR 2020)</div>

<div id="refer-14">[14] <a href="https://arxiv.org/abs/2111.03930">Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.</a> (Preprint)</div>

<div id="refer-15">[15] <a href="https://arxiv.org/abs/2110.02523">KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier.</a> (Preprint)</div>

<div id="refer-16">[16] <a href="https://aclanthology.org/2021.acl-long.220.pdf">OntoED: Low-resource Event Detection with Ontology Embedding.</a> (ACL 2021)</div>

<div id="refer-17">[17] <a href="https://arxiv.org/pdf/2010.16059.pdf">Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction.</a> (COLING 2020)</div>


</div>

<div id="bibtex-container" class="related">
  For attribution in academic contexts, please cite this work as
  <pre id="bibtex-academic-attribution">

  </pre>

  BibTeX citation
  <pre id="bibtex-box">

  </pre>
</div>
<script>
  let authorsSpan = document.getElementById("iclr-post-authors");
  let authorsText = authorsSpan.textContent;
  let lnameFnameInstitution = authorsText.split(";");
  let lfiList = lnameFnameInstitution.map(lfi => lfi.split(",").map(item => item.trim()));
  let bibtexLFI = lfiList.map(lfi => lfi[0] + ", " + lfi[1]).join(" and ")
  let academicLFI = lfiList.map(lfi => lfi[0]);
  {
    if(academicLFI.length > 2) academicLFI = academicLFI[0] + ", et al.";
    else if(academicLFI.length == 2) academicLFI = academicLFI[0] + " & " + academicLFI[1];
    else academicLFI = academicLFI[0];
  }

  let titleSpan = document.getElementById("iclr-post-title");
  let titleText = titleSpan.textContent.trim();
  let bibtexTitleShorthand = (lfiList[0][1]+
    "2022"+
    titleText.split(" ").slice(0, 3).join("")
  ).replace(" ", "").replace(/[\p{P}$+<=>^`|~]/gu, '').toLowerCase().trim();

  let bibtexTemplate = `
@inproceedings{${bibtexTitleShorthand}},
  author = {${bibtexLFI}},
  title = {${titleText}},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {${window.location.href}},
  url  = {${window.location.href}}
}
  `.trim();
  document.getElementById("bibtex-box").innerText = bibtexTemplate;

  let academicTemplate = `
${academicLFI}, "${titleText}", ICLR Blog Track, 2022.
`.trim();
  document.getElementById("bibtex-academic-attribution").innerText = academicTemplate;

</script>


<div class="related">
  <h2>Related posts</h2>
  <ul class="related-posts">
    
      <li>
        <h3>
          <a href="/2021/09/01/sample-submission/">
            Sample Submission
            <small>01 Sep 2021 | 
    <a class="content-tag" href="/tags/#few-shot-learning"> few-shot learning </a>
  
    <a class="content-tag" href="/tags/#prior-knowledge"> prior knowledge </a>
  
    <a class="content-tag" href="/tags/#distribution-estimation"> distribution estimation </a>
  
    <a class="content-tag" href="/tags/#bias-correction"> bias correction </a>
  </small>
          </a>
        </h3>
      </li>
    
      <li>
        <h3>
          <a href="/2020/04/02/example-content/">
            Example content (Basic Markdown)
            <small>02 Apr 2020 | 
    <a class="content-tag" href="/tags/#few-shot-learning"> few-shot learning </a>
  
    <a class="content-tag" href="/tags/#prior-knowledge"> prior knowledge </a>
  
    <a class="content-tag" href="/tags/#distribution-estimation"> distribution estimation </a>
  
    <a class="content-tag" href="/tags/#bias-correction"> bias correction </a>
  </small>
          </a>
        </h3>
      </li>
    
  </ul>
</div>


<script src="https://utteranc.es/client.js"
        repo="iclr-blog-track/iclr-blog-track.github.io"
        issue-term="pathname"
        label="utterance"
        theme="boxy-light"
        crossorigin="anonymous"
        >
</script>


      </div>
    </div>

    <label for="sidebar-checkbox" class="sidebar-toggle"></label>

    <script src='/public/js/script.js'></script>
  </body>
</html>
