<!DOCTYPE html>
<html lang="en-us">

  <head>
  <link href="http://gmpg.org/xfn/11" rel="profile">
  <meta http-equiv="content-type" content="text/html; charset=utf-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">

  <title>
    
      FrLove : Could a Frenchman rapidly identify Lovecraft? &middot; The ICLR Blog Track
    
  </title>

  
  <link rel="canonical" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/2021/12/01/frlove/">
  

  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/css/poole.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/css/syntax.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/css/lanyon.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/css/custom.css">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">

  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/apple-touch-icon-precomposed.png">
  <link rel="shortcut icon" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/favicon.ico">

  <link rel="alternate" type="application/rss+xml" title="RSS" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/atom.xml">

  

  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript" ></script>
 <!-- <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> -->
  <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
         processEscapes: false
        }
      });
</script>
</head>


  <body>

    <!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
     styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
<!-- <input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" > -->

<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
  <div class="sidebar-item">
    <p>For short-term, peer-sourced tests of time, generalizations, specializations, reproductions, etc.!</p>
  </div>

  <nav class="sidebar-nav">

    

    
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/">ICLR 2022 Blog Track</a>
        
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/about/">About</a>
        
      
    
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/submitting/">Submitting</a>
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/tags/">Tags</a>
        
      
    

    <a class="sidebar-nav-item" href="https://github.com/iclr-blog-track/iclr-blog-track.github.io">GitHub project</a>
    <span class="sidebar-nav-item">Currently vICLR Spring 2021</span>
  </nav>

  <div class="sidebar-item">
    <p>
      &copy; 2022. All rights reserved.
    </p>
  </div>
</div>


    <!-- Wrap is the content to shift when toggling the sidebar. We wrap the
         content to avoid any CSS collisions with our real content. -->
    <div class="wrap">
      <div class="masthead">
        <div class="container">
          <h3 class="masthead-title">
            <a href="/" title="Home">The ICLR Blog Track</a>
            <small></small>
          </h3>
        </div>
      </div>

      <div class="container content">
        <div class="post">
  <h1 id="iclr-post-title" class="post-title">FrLove : Could a Frenchman rapidly identify Lovecraft?</h1>
  <span class="post-date">01 Dec 2021 | 
    <a class="content-tag" href="/tags/#few-shot-learning"> few shot learning </a>
  
    <a class="content-tag" href="/tags/#cross-domain"> cross-domain </a>
  
    <a class="content-tag" href="/tags/#self-training"> self training </a>
  
    <a class="content-tag" href="/tags/#multilingual"> multilingual </a>
  </span>

  <span id="iclr-post-authors" class="post-date">ANONYMOUS, ANON (DOUBLE BLIND)</span>
  <p>This post examines the work in <a href="https://openreview.net/forum?id=O3Y56aqpChA">Self-training For Few-shot Transfer Across Extreme Task Differences</a>, accepted as an oral presentation at ICLR 2021. In this post, we break down the task of <em>cross-domain few shot learning</em>, present a bird’s eye overview of the techniques involved, and describe the proposed method - <strong>STARTUP</strong>. In the latter section of the post, we also attempt to extend the experiments to language. To the best of our knowledge, cross-domain few shot learning in a multilingual setting has not been explored before and it raises interesting questions. Our code is available at this <a href="https://anonymous.4open.science/r/frlove-5918/">link</a>.</p>

<h2 id="what-is-startup">What is STARTUP?</h2>

<p>“Self Training to Adapt Representations To Unseen Problems”, or STARTUP, is a simple but highly effective method that tackles the problem of <em>cross-domain few shot learning</em>. Few-shot Learning is the task of learning on a very limited set of samples. For example, 5-shot learning implies that there are 5 examples (<em>shots</em>) available for each class that the model is supposed to learn. Typically, we have a large training set and much smaller support and query sets. The training set is composed of <em>base</em> classes, and the support set contains the <em>target</em> classes to be learnt by the few-shot learner.</p>

<p>In conventional few-shot learning, a key observation is that there is a relation between the base and the target domains. For example, the base domain may be ImageNet (containing many <code class="language-plaintext highlighter-rouge">bird</code> classes), and our target domain may be the various species in the hummingbird family.</p>

<p><img src="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/images/2021-12-01-frlove/imagenet-hummingbird.jpeg" alt="ImageNet base and hummingbird target" />
<em>Figure 1: ImageNet may not contain images of the Rufous Hummingbird or the Blue-throated mountaingem, but it does contain many bird classes, including ‘hummingbird’ itself.</em></p>

<p>Upping the difficulty, we have the task of cross-domain few shot learning, where in addition to learning from a very limited number of examples, we must also contend with the fact that our source and target domains are radically different. Such as going from distinguishing between objects to detecting possible pneumonia. Recent work has shown that the performance of few-shot learning methods degrades when confronted with this extreme task difference.</p>

<p><img src="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/images/2021-12-01-frlove/imagenet-xray.jpeg" alt="ImageNet base and Xray target" />
<em>Figure 2: As diverse as ImageNet is, it does not contain X-ray scans (yet). Images from <a href="https://www.kaggle.com/nih-chest-xrays/data">ChestX dataset</a></em></p>

<p>What is the motivation for this task? As the readers might be aware, collecting high-quality data can be expensive, time-consuming, and prone to dangerous biases. The promise of few-shot learning, i.e. learning with a few well-labeled examples, then becomes very attractive. Cross-Domain Few Shot Learning (CD-FSL) takes this promise further, by now removing even the barrier of having semantically related base and target domains.</p>

<p>The authors’ key observation comes from making use of <em>unlabeled data</em>. Undiagnosed X-ray scans and unlabeled satellite images are much more readily available than high-quality annotated data and offer an alternative solution in the form of <em>Self-supervised Learning</em> (SSL). SSL has been used to great effect in recent years, especially in the training of large language models. However, SSL requires an incredibly large amount of data to train effective representations. The largest language models today are trained on corpora containing a significant chunk of data on the World Wide Web, such as <a href="https://arxiv.org/pdf/2101.00027.pdf">The Pile</a>. In summary:</p>

<ol>
  <li>Collecting data for some domains (medical scans, satellite imagery etc) can be hard, time-consuming and difficult. Existing few shot learners are unable to generalize from other domains to these.</li>
  <li>While unlabeled data may be available for these domains, it is not available in the massive amount that is required to train an effective self-supervised representation.</li>
</ol>

<p>The authors propose <strong>STARTUP</strong> to leverage unlabeled data in training representations to adapt to the target domain. When a classifier trained on the base domain is applied to the target domain, the classes assigned may be irrelevant, but the <em>grouping</em> in these classes can be utilized to train better representations. Training in this manner would adapt feature representations to the target domain while maintaining the knowledge that caused this meaningful grouping. Put simply, classifying X-ray scans as various ImageNet classes may not be meaningful, but the grouping of <code class="language-plaintext highlighter-rouge">pneumonia</code> under the <code class="language-plaintext highlighter-rouge">golden retriever</code> class can provide useful information.</p>

<p><img src="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/images/2021-12-01-frlove/grouping-xrays.jpeg" alt="Grouping Xrays by ImageNet" />
<em>Figure 3: In this oversimplified example, we see a network classifying X-ray scans into ImageNet classes. This image depicts a scenario where X-ray scans of the ‘pneumonia’ class are tagged as ‘golden retriever’ by the network, and ‘no disease’ as ‘bald eagle’.</em></p>

<p>The model itself labels the data in the target domain, similar to a form of learning known as <em>self-training.</em> In self-training, the model labels its own training data based on its current state. The new labels, which are probability distributions over the classes rather than a single concrete class, are known as <em>pseudolabels</em>. These pseudolabels are then used to train the model and subsequently improve the pseudolabels, in an iterative process. In the case of STARTUP however, the goal is not to improve the model, but to use these pseudolabels to adapt its representations to the target domain.</p>

<p>To reiterate, data from the target domain is classified in the base domain. At the individual level, this classification may not make sense, but at the aggregate level, the <em>distribution</em> of the target data in the base domain can be used to improve the representations learnt. A way to <em>startup</em> better representations, if you will.</p>

<h2 id="how-well-does-startup-work">How (well) does STARTUP work?</h2>

<p>At its core, STARTUP is effective in its simplicity and comprises of three steps in adapting the feature backbone.</p>

<ol>
  <li>The first step is to learn a <em>teacher</em> model $\theta_T$ which can aid in self-training representations. This model is trained in a standard supervised manner on the <strong>base</strong> dataset $D_B$ using the Cross-Entropy Loss $\mathcal{L}_{CE}$.</li>
  <li>The teacher model $\theta_T$ is used to softly label the unlabeled data from the target domain $D_U$ to give us the pseudolabeled dataset $D_{U*}$. These pseudolabels are a probability distribution over the classes in the base domain.</li>
  <li>Finally, a student model $\theta_S$ is learnt using $D_B$, $D_U$ and $D_{U*}$ according to the following loss function:</li>
</ol>

\[\mathcal{L}_{STARTUP} = \mathcal{L}_{CE}(D_B) + \mathcal{L}_{KL}(D_{U*}) + \mathcal{L}_{SSL}(D_U)\]

<p>The loss function is made up of three components, each directing learning in a different way. \(\mathcal{L}_{CE}\), as mentioned above, is just Cross-Entropy loss on the base dataset. $\mathcal{L}_{KL}$ is the <strong>KL Divergence</strong> between the output of the student model and the pseudolabel generated by the teacher, capturing the distance between the two distributions. Together, these two components <em>guide</em> the student to generate groupings of the data similar to the teacher. In particular, the authors posit that minimizing the distance between the pseudolabels of the teacher and the student with KL Divergence encourages the model to emphasize the groupings of the pseudo labels.</p>

<p>Finally, the third component $\mathcal{L}_{SSL}$ is the self-supervised loss on the unlabeled data. In STARTUP, the SSL used is <a href="https://arxiv.org/abs/2002.05709">SimCLR</a>, a popular contrastive loss. Very briefly summarized, it minimizes the distance between two augmentations of the same image while maximizing its distance from the other images in the batch. As mentioned earlier, SSL alone requires large amounts of data to train effectively but it boosts STARTUP by extracting meaning from the unlabeled data available.</p>

<p>STARTUP fuses multiple paradigms (self-training, self-supervised learning and traditional supervised learning) in a simple fashion to train models that outperform the state-of-the-art on the <a href="https://arxiv.org/pdf/1912.07200.pdf">BSCD-FSL dataset</a>.</p>

<p><img src="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/images/2021-12-01-frlove/bscd-fsl.png" alt="BSCD-FSL" />
<em>Figure 4: The BSDCD-FSL benchmark consists of ImageNet as a base domain and 4 very different target domains. Note the factors which denote the task ‘distance’ (Perspective, content and colour of image). Image from <a href="https://arxiv.org/pdf/1912.07200.pdf">BSCD-FSL</a></em></p>

<p>Results of STARTUP have been omitted in the interest of brevity. In the <a href="https://openreview.net/forum?id=O3Y56aqpChA">paper</a>, the authors conduct numerous experiments and ablation studies verifying the effectiveness of STARTUP and achieve an absolute improvement of 2.9 points on average on the BSCD-FSL dataset.</p>

<h2 id="could-a-frenchman-rapidly-identify-lovecraft">Could a Frenchman rapidly identify Lovecraft?</h2>

<p>This section of the post takes a different approach to evaluating STARTUP, namely, evaluating few-shot cross-domain learning when the domains are languages. This raises some interesting questions on the nature of domains and how one might adapt cross-domain learning to a multilingual setting. These experiments are summarized as ‘FrLove: Could a Frenchman rapidly identify Lovecraft?’. Or in other words, could a model which has been trained on French texts (a “Frenchman”) perform few-shot classification (“rapidly identify”) on a text from a different domain (the works of H P Lovecraft, a prominent English author). <strong>FrLove</strong> is an attempt to answer two questions, one specific to STARTUP and one examining the broader field of cross-domain few shot learning.</p>

<ul>
  <li><strong>How does the <em>ratio</em> of base classes to target classes affect STARTUP?</strong> As described above, STARTUP works by exploiting the grouping of the target domain data among the base classes. Our hypothesis is that the more fine-grained this grouping is, the easier it is for STARTUP to train. Consider a dataset with $n_{base}/n_{target} = 20/10 = 2$. The 10 target classes will be distributed among the 20 base classes, providing more information to STARTUP as compared to $n_{base}/n_{target} = 20/40 = 0.5$. Will the performance of STARTUP degrade when there is ‘crowding’ of the target data in the base classes?</li>
  <li><strong>How can we quantifiably evaluate the distance between domains?</strong> This is, of course, a much larger question which cannot be adequately answered within the scope of a blogpost. Nevertheless, we attempt to do so for language by utilizing existing linguistic knowledge.</li>
</ul>

<h3 id="on-the-distance-between-languages">On the “distance” between languages</h3>

<p>The dataset used in the experiments in the paper, the BSCD-FSL benchmark contains 4 datasets, from vastly different domains (see figure above). These datasets are ordered by similarity to ImageNet by a number of heuristics. These heuristics however, are not applicable to any two image datasets of varying contents. We believe it is an important research question to quantify the “distance” between any two tasks in cross-domain learning. One such use case this enables is the efficient transfer of knowledge across domains, since we would be able to gather a sufficient amount of data based on the “distance” between the two domains. In this post, we use language family trees as a universal distance framework.</p>

<p><img src="https://iclr.iro.umontreal.ca/31b9168b-1745-49c8-a1ca-e10d6449c229_1642224896/public/images/2021-12-01-frlove/IndoEuropeanTree.png" alt="IndoEuropean Language Family Tree" />
<em>Figure 5: The Indo-European language family tree. Note the highlighted locations of German, English and French. <a href="https://upload.wikimedia.org/wikipedia/commons/4/4f/IndoEuropeanTree.svg">Source</a></em></p>

<p>The Indo-European language family comprises of a large number of languages stretching from Europe to Northern India. 3.2 billion people across the world speak one of these languages as their first language. All the languages in this tree are thought to be descended from a single language, called Proto-Indo-European. Through a study of how languages have evolved, their shared innovations and historical evidence, linguists and historians categorize languages into trees and subtrees. In the family tree above, note that German and English belong to the <em>Germanic</em> family tree, whereas French belongs to the <em>Romance</em> family of languages, a separate offshoot. As a rudimentary metric, the distance between the nodes of the tree can give us the distance between the two languages. English and German are closer to each other than English and French which in turn is closer than English and Mandarin Chinese (belonging to the Sino-Tibetan language family).</p>

<p>We hypothesize that performance will be better when transferring from German -&gt; English, as compared to French -&gt; English.</p>

<p><strong>Note: The BCSD-FSL benchmark targets cross-domain learning where the domain is a <em>task</em>, i.e. crop disease classification or satellite image classification. In this section, we focus on the <em>language</em> as a domain. We attempt to quantify the distance between French and English, and not the distance between - for example - French poetry classification and English spam detection, a much harder problem.</strong></p>

<h3 id="implementing-frlove">Implementing FrLove</h3>

<p>In image tasks, even if the domain of the tasks is vastly different (X-ray scans and crop diseases), the basic <em>building block</em> of the image is the same, that is, the pixel. This does not exactly hold true when dealing with languages. The basic element of a written language, the letter, is only common for a <em>script</em>, which vary across languages. For example, in the Indo-European family tree - German, French and English use the Roman script, based on the familiar 26 letters A-Z. In another branch of the tree, Hindustani, the Hindi language uses the Devanagari script (देवनागरी). Thus, we require two languages of the same script but distinct from each other.</p>

<p>Even if two languages have the same script, there is the additional complication that commonly used neural networks like BERT operate at the <em>word</em> or token level. While there may be words in common between languages, we would no doubt see sequences of meaningless [UNK] tokens.</p>

<p>Thus, we propose making use of <a href="https://fasttext.cc/">FastText</a> embeddings rather than word-level tokens. <a href="https://arxiv.org/pdf/1607.04606.pdf">FastText learns <em>subword</em> embeddings</a>, looking at character n-grams rather than the word as a whole. Thus, with FastText, we can provide embeddings even for words from different languages (albeit not likely to be meaningful embeddings).</p>

<p>Existing French / German datasets typically revolve around hate speech detection or sentiment analysis, binary or slightly larger classification tasks. The <a href="https://arxiv.org/abs/2109.14076">Real-world Annotated Few Shot Tasks</a> is a few-shot text classification benchmark on a large number of domains in English. To the best of our knowledge there is no multilingual few-shot learning dataset in the same domain. For a streamlined flow consistent across languages, we make use of <a href="https://gutenberg.org/">Project Gutenberg</a>, a free multilingual e-book repository, to create a dataset for author classification for multiple languages.</p>

<p>From Project Gutenberg, we download texts of various popular authors from English, French and German (to name a few - Charles Dickens, H P Lovecraft, Voltaire, Friedrich Nietzsche). The texts are segmented into chunks of ten sentences labeled with the author’s name. Then we follow the methodology proposed by STARTUP for our experiments on both French and German to English. In the case of Fr-&gt;En:</p>

<ol>
  <li>The teacher model is trained on processed French author data in a standard supervised learning task.</li>
  <li>The French teacher is used to pseudolabel the unlabeled data from the English corpus.</li>
  <li>A Fr-&gt;En student model is learnt using the aforementioned loss function.</li>
</ol>

<p>The evaluation process for STARTUP is straightforward. The student model is frozen and a linear classifier is trained on top of it using the <em>support set</em>, as is common in Few-shot learning. As a baseline method, we also consider <strong>Naive Transfer</strong>, where a linear classifier is learnt on a model trained on the base dataset itself. The simplest form of transfer learning, and yet it performs very well, as shown in the paper. In our case, Naive Transfer is equivalent to evaluating using the teacher model, rather than the student model.</p>

<h3 id="results">Results</h3>

<p>Below are the results for <em>5-way 15-shot</em> classification:</p>

<table>
  <thead>
    <tr>
      <th>Base Language</th>
      <th>n_base</th>
      <th>Naive Transfer (acc - %)</th>
      <th>STARTUP (acc - %)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Fr</td>
      <td>5</td>
      <td>20.4</td>
      <td>20.4</td>
    </tr>
    <tr>
      <td>Fr</td>
      <td>10</td>
      <td>20.0</td>
      <td>20.0</td>
    </tr>
  </tbody>
  <tbody>
    <tr>
      <td>De</td>
      <td>5</td>
      <td>25.2</td>
      <td>26.4</td>
    </tr>
    <tr>
      <td>De</td>
      <td>10</td>
      <td>22.8</td>
      <td>26.4</td>
    </tr>
  </tbody>
</table>

<p>As can be seen from the above table, results across the table are poor, being close to the expected accuracy for a random predictor (20% on a 5-way classification task).</p>

<p>Unfortunately, our experiments were limited by lack of computational resources. As such, both the training time and the amount of data collected were reduced in comparison to the paper. Finetuning FastText embeddings in particular was computationally expensive and reduced the amount of data that was feasible to use. It is our belief that stronger results could be achieved with larger amounts of data and more training time using STARTUP.</p>

<p>Experiments with changing the $n_{base}/n_{target}$ ratio were inconclusive. Changing $n_base$ from 5 to 10 while holding $n_target$ at 5 caused either minor or no changes in all cases except in Naive Transfer for De-&gt;En where accuracy drops from 25.2% to 22.8%. Interestingly, the accuracy for STARTUP remains constant at 26.4%.</p>

<p>We do see a marked increase in accuracy when comparing Fr-&gt;En and De-&gt;En however. In line with our hypothesis - because German and English are closer to each other, transfer should be more efficient - we see stronger performance of both Naive Transfer and STARTUP when doing a cross-domain transfer from German to English.</p>

<h2 id="conclusion">Conclusion</h2>

<p>This post examines <a href="https://openreview..net/forum?id=O3Y56aqpChA">Self-training For Few-shot Transfer Across Extreme Task Differences</a> or STARTUP, presented at ICLR 2021. STARTUP is a method proposed for the challenging task of cross-domain few shot learning. Briefly summarized, STARTUP extracts meaning from how the target domain data is distributed among the classes in the base domain, and utilizes this to transfer to the target domain more effectively.</p>

<p>This post also presents <strong>FrLove</strong>, an attempt to verify whether the distance between languages as measured in a family tree correlates to effectiveness of transfer. According to our preliminary experiments, a “Frenchman” (Fr-&gt;En student model) would <em>not</em> be able to identify Lovecraft among his fellow English authors, but a German has a better chance of doing so. However, these experiments are preliminary and our results are inconclusive, given the small amount of training data and training time.</p>

<p>Cross-Domain Few Shot Learning is an interesting and challenging problem, with important ramifications across domains. In the context of language, it opens up new avenues in working with low resource languages, where data is not present in sufficient quantities to enable traditional machine learning methods. Insights on how the model’s parameters change when learning might also prove useful in our own endeavours at learning new languages! It is an exciting problem to say the least, and one to keep an eye on.</p>

</div>

<div id="bibtex-container" class="related">
  For attribution in academic contexts, please cite this work as
  <pre id="bibtex-academic-attribution">

  </pre>

  BibTeX citation
  <pre id="bibtex-box">

  </pre>
</div>
<script>
  let authorsSpan = document.getElementById("iclr-post-authors");
  let authorsText = authorsSpan.textContent;
  let lnameFnameInstitution = authorsText.split(";");
  let lfiList = lnameFnameInstitution.map(lfi => lfi.split(",").map(item => item.trim()));
  let bibtexLFI = lfiList.map(lfi => lfi[0] + ", " + lfi[1]).join(" and ")
  let academicLFI = lfiList.map(lfi => lfi[0]);
  {
    if(academicLFI.length > 2) academicLFI = academicLFI[0] + ", et al.";
    else if(academicLFI.length == 2) academicLFI = academicLFI[0] + " & " + academicLFI[1];
    else academicLFI = academicLFI[0];
  }

  let titleSpan = document.getElementById("iclr-post-title");
  let titleText = titleSpan.textContent.trim();
  let bibtexTitleShorthand = (lfiList[0][1]+
    "2022"+
    titleText.split(" ").slice(0, 3).join("")
  ).replace(" ", "").replace(/[\p{P}$+<=>^`|~]/gu, '').toLowerCase().trim();

  let bibtexTemplate = `
@inproceedings{${bibtexTitleShorthand}},
  author = {${bibtexLFI}},
  title = {${titleText}},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {${window.location.href}},
  url  = {${window.location.href}}
}
  `.trim();
  document.getElementById("bibtex-box").innerText = bibtexTemplate;

  let academicTemplate = `
${academicLFI}, "${titleText}", ICLR Blog Track, 2022.
`.trim();
  document.getElementById("bibtex-academic-attribution").innerText = academicTemplate;

</script>


<div class="related">
  <h2>Related posts</h2>
  <ul class="related-posts">
    
      <li>
        <h3>
          <a href="/2021/09/01/sample-submission/">
            Sample Submission
            <small>01 Sep 2021 | 
    <a class="content-tag" href="/tags/#few-shot-learning"> few shot learning </a>
  
    <a class="content-tag" href="/tags/#cross-domain"> cross-domain </a>
  
    <a class="content-tag" href="/tags/#self-training"> self training </a>
  
    <a class="content-tag" href="/tags/#multilingual"> multilingual </a>
  </small>
          </a>
        </h3>
      </li>
    
      <li>
        <h3>
          <a href="/2020/04/02/example-content/">
            Example content (Basic Markdown)
            <small>02 Apr 2020 | 
    <a class="content-tag" href="/tags/#few-shot-learning"> few shot learning </a>
  
    <a class="content-tag" href="/tags/#cross-domain"> cross-domain </a>
  
    <a class="content-tag" href="/tags/#self-training"> self training </a>
  
    <a class="content-tag" href="/tags/#multilingual"> multilingual </a>
  </small>
          </a>
        </h3>
      </li>
    
  </ul>
</div>


<script src="https://utteranc.es/client.js"
        repo="iclr-blog-track/iclr-blog-track.github.io"
        issue-term="pathname"
        label="utterance"
        theme="boxy-light"
        crossorigin="anonymous"
        >
</script>


      </div>
    </div>

    <label for="sidebar-checkbox" class="sidebar-toggle"></label>

    <script src='/public/js/script.js'></script>
  </body>
</html>
