<!DOCTYPE html>
<html lang="en-us">

  <head>
  <link href="http://gmpg.org/xfn/11" rel="profile">
  <meta http-equiv="content-type" content="text/html; charset=utf-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">

  <title>
    
      The ICLR Blog Track &middot; 
    
  </title>

  
  <link rel="canonical" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/">
  

  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/public/css/poole.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/public/css/syntax.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/public/css/lanyon.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/public/css/custom.css">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">

  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/public/apple-touch-icon-precomposed.png">
  <link rel="shortcut icon" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/public/favicon.ico">

  <link rel="alternate" type="application/rss+xml" title="RSS" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/atom.xml">

  

  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript" ></script>
 <!-- <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> -->
  <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
         processEscapes: false
        }
      });
</script>
</head>


  <body>

    <!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
     styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">

<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
  <div class="sidebar-item">
    <p>For short-term, peer-sourced tests of time, generalizations, specializations, reproductions, etc.!</p>
  </div>

  <nav class="sidebar-nav">
    <a class="sidebar-nav-item active" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/">Home</a>

    

    
    
      
        
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/about/">About</a>
        
      
    
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/submitting/">Submitting</a>
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/tags/">Tags</a>
        
      
    

    <a class="sidebar-nav-item" href="https://github.com/KandiSanjana/KandiSanjana.github.io">GitHub project</a>
    <span class="sidebar-nav-item">Currently vICLR Spring 2021</span>
  </nav>

  <div class="sidebar-item">
    <p>
      &copy; 2022. All rights reserved.
    </p>
  </div>
</div>


    <!-- Wrap is the content to shift when toggling the sidebar. We wrap the
         content to avoid any CSS collisions with our real content. -->
    <div class="wrap">
      <div class="masthead">
        <div class="container">
          <h3 class="masthead-title">
            <a href="/" title="Home">The ICLR Blog Track</a>
            <small></small>
          </h3>
        </div>
      </div>

      <div class="container content">
        <div class="posts">
  
  <div >
    <h1 class="post-title">
      <a href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/2021/12/01/albert/">
        BERT vs ALBERT explained
      </a>
    </h1>

    <span class="post-date">01 Dec 2021 | 
      <a class="content-tag" href="/tags/#2021-12-01-albert"> 2021-12-01-albert </a>
        
      <a class="content-tag" href="/tags/#nlp"> NLP </a>
        
      <a class="content-tag" href="/tags/#machine-learning"> Machine Learning </a>
        
      <a class="content-tag" href="/tags/#scale"> Scale </a>
        
      <a class="content-tag" href="/tags/#bert"> BERT </a>
        
      <a class="content-tag" href="/tags/#albert"> ALBERT </a>
        
    </span>
    <span class="post-date">Ramu, Sahana, Carnegie Mellon University; Kandi, Sanjana, Carnegie Mellon University</span>

    <!-- <h1 align="center">BERT vs ALBERT explained</h1>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I1.png" alt="ALT" />
  
</div>

<h2 id="introduction">Introduction</h2>

<p>Implementing Machine Learning and Deep Learning models at scale require an immense amount of training time and computational resources. Particularly in the context of language representation learning, studies have shown that full network pre-training which is large is of crucial importance for achieving state-of-the-art performance. But, we know that increasing the model size results in an increase in the number of model parameters, which significantly increases the training and computation requirements. This can be a huge challenge in the domain of large scale computing. In this blog, we provide a brief summary of the ICLR paper “ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS.” This paper talks about two parameter reduction techniques to lower memory consumption and increase the training speed of the BERT (Bidirectional Encoder Representations from Transformers) architecture. The proposed methods in the paper led to models that scale much better compared to the original BERT.</p>

<h2 id="what-is-bert">What is BERT?</h2>
<p>We all know Google’s BERT has changed the NLP landscape, but what is it exactly?
BERT is one of the most famous natural language processing (NLP) frameworks used to help computers understand the meaning of text by using the surrounding text as context. BERT which stands for ‘<strong>B</strong>idirectional <strong>E</strong>ncoder <strong>R</strong>epresentations from <strong>T</strong>ransformers’ is built upon the concept of transformers where every output element is connected to every input element and their weights are dynamically calculated. In NLP, this process is commonly known as ‘Attention’.</p>

<h2 id="now-what-is-albert">Now… what is ALBERT?</h2>
<p>BERT is known for performing tasks ranging from simple text classification to complex tasks like Question Answering. While it seems like the perfect language model, this state-of-the-art architecture deals with millions if not billions of parameters which might significantly hamper training speed as we scale these models since communication overhead is directly proportional to the number of parameters. These issues are addressed by designing <strong>A</strong> <strong>L</strong>ite <strong>BERT</strong> (ALBERT) which is similar to the architecture of BERT, except for the fact that it deals with much lesser parameters. 
So, how exactly does ALBERT overcome this issue?
ALBERT incorporates two parameter reduction techniques in its implementation, which are: Factorized embedding parameterization and Cross-layer parameter sharing. Apart from these, self-supervised loss is also introduced for sentence-order prediction</p>

<p>Wondering what these mean? Let’s now dive into some details!</p>

<p>First, let’s look at the ALBERT model architecture. 
It is similar to that of BERT, that is, it uses a transformer encoder with GELU non-linearities.</p>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I2.png" alt="ALT" />
  
</div>

<div align="center">
  <table>
    <thead>
      <tr>
        <th>Parameter</th>
        <th>Symbol</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Embedding Size</td>
        <td>E</td>
      </tr>
      <tr>
        <td>Number of Encoder Layers</td>
        <td>L</td>
      </tr>
      <tr>
        <td>Hidden size</td>
        <td>H</td>
      </tr>
      <tr>
        <td>Feed forward/filter size</td>
        <td>4H</td>
      </tr>
      <tr>
        <td>Number of attention heads</td>
        <td>H/64</td>
      </tr>
    </tbody>
  </table>
</div>

<p>Let us now look at how these parameter reduction techniques actually work.</p>

<h3 id="1-factorized-embedding-parameterization">1. Factorized embedding parameterization</h3>
<p>In BERT, the WordPiece embedding size E is the same as the hidden layer size H. This leads to suboptimal performance due to the following reasons:</p>
<ul>
  <li>NLP tasks require a very large vocabulary size, denoted by V. If the embedding size is equal to the hidden size H, then increasing H leads to increase in size of the embedding matrix, i.e., V X E. This leads to an increase in the number of parameters in the model to billions, hence circling back to our primary problem.</li>
  <li>WordPiece embeddings are meant to learn context-independent representations, whereas hidden-layer embeddings are meant to learn context-dependent representations.
BERT primarily uses context-dependent representations, which requires the hidden size H to be much greater than embedding size E. If H and E are tied together, increasing H will increase E, thereby increasing the total model parameters.</li>
</ul>

<p>Now, to combat this, ALBERT first decomposes the embedding parameters into two smaller matrices. First, the one-hot encoded vectors are projected into the lower dimensional embedding space of size E, and then projected to the hidden space of size H. We are therefore going from O(V × H) to O(V × E + E × H).
This is quite significant because it reduces the number of parameters when H»E.</p>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I4.png" alt="ALT" />
  
</div>

<p>In the above table, we can see the performance of ALBERT based models with varying the embedding size E. We can see that non-shared embeddings (BERT style) perform better at higher E’s, but not by a significant margin. So for the expense of 1% reduction in accuracy in ALBERT, the number of parameters reduced is in the range 70-80M, which is a significant improvement from BERT. Out of all the E’s, 128 appears to perform better than the rest.</p>

<h3 id="2-cross-layer-parameter-sharing">2. Cross-layer parameter sharing</h3>
<p>The main purpose of parameter sharing is the radical reduction of parameters in a network. While the accuracy does slightly reduce by employing this method, the main goal of parameter reduction is achieved along with generalization of the model. While there are many ways to share parameters, ALBERT takes the default decision of sharing all parameters across layers. The performance of BERT and ALBERT can be compared by looking at the L2 and Cosine distances of the input and output embeddings of each layer as shown below.</p>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I3.png" alt="ALT" />
  
</div>

<p>As we can see in the figure above, the transitions from layer to layer are much smoother for ALBERT than BERT. Hence, apart from just parameter reduction, parameter sharing across layers also stabilizes the parameters.</p>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I5.png" alt="ALT" />
  
</div>

<p>The above table compares the ALBERT based models based on different configurations of parameter sharing. It considers embedding sizes E = 128 and E = 768. It is evident from the results that the not-shared (BERT-style) strategy performs the best, at the cost of a large number of parameters. The all-shared strategy (ALBERT-style) hurts the performance under both E’s, but the reduction is not severe compared to the not-shared strategy. Therefore, the all-shared strategy is better for this application and used as the default choice.</p>

<h3 id="3-inter-sentence-coherence-loss">3. Inter-sentence coherence loss</h3>
<p>In BERT, two types of losses are used, namely, Masked Language Modelling (MLM) loss and Next Sentence Prediction (NSP) loss. NSP loss is used to determine if two segments occur consecutively in a text. It was found that NSP loss is unreliable due to its lack of difficulty as a task. Therefore, in ALBERT, a new loss called sentence-order prediction (SOP) loss, focusing on inter sentence coherence was used. For positive samples, it uses two consecutive sentences from the same document and the same consecutive sentences with order swapped for negative examples. This helps to learn finer-grained distinctions about discourse-level coherence properties. Therefore, the ALBERT model performs better on multi-sentence encoding tasks.</p>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I6.png" alt="ALT" />
  
</div>

<p>This table compares the results of additional inter-sentence loss. It takes into account no additional loss, as in XLNet- and RoBERTa-style, NSP (BERT-style) and SOP (ALBERT-style). The comparison is performed for both intrinsic and downstream tasks. We can see that SOP loss solves the NSP tasks well, and performs much better on SOP tasks. The downstream performance is much better with SOP loss for multi-sentence encoding tasks, providing an improvement of 1% on an average.</p>

<h2 id="how-do-these-two-compare">How do these two compare?</h2>

<h3 id="1-comparison-with-number-of-parameters">1. Comparison with number of parameters</h3>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I7.png" alt="ALT" />
  
</div>

<p>Now that we have talked about the methods used for parameter reduction, let us actually compare BERT and ALBERT by looking at some numbers.
For example, ALBERT-large has about 18x lesser parameters compared to BERT-large which can be viewed as ALBERT having 18M parameters while BERT has 334M parameters!!
We could also look at it from another perspective by considering the hidden layer size. An ALBERT-xlarge configuration with H = 2048 has only 60M parameters and an ALBERT-xxlarge configuration with H = 4096 has 233M parameters, i.e., around 70% of BERT large’s parameters.</p>

<p>From the comparison above, it is obvious that ALBERT performs better than BERT! But as Machine Learning enthusiasts, it is always better to perform comparison with a couple of popular benchmark datasets such as GLUE, SQuAD and RACE.</p>

<h3 id="2-comparison-with-benchmarks">2. Comparison with benchmarks</h3>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I8.png" alt="ALT" />
  
</div>

<p>ALBERT-xxlarge requires only 70% of the  BERT-large’s parameters, to achieve significant improvements over BERT-large. This improvement can be largely seen on RACE (+8.4%).</p>

<h3 id="3-comparison-with-training-time">3. Comparison with training time</h3>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I9.png" alt="ALT" />
  
</div>

<p>The table compares the time of training vs the data throughput. Generally, longer training leads to better performance. So, here, training time is kept constant and data throughput is compared. We can see that ALBERT-xxlarge outperforms BERT-large in just 125k steps (32 hours), in comparison to BERT-large which takes 400k steps (34 hours) to achieve similar results. Here again, the most improvement can be seen on RACE (+5.2%)</p>

<p>The authors then decided to get their hands dirty and try out a few add-ons to improve the model! Let’s see what this is.</p>

<h2 id="additional-training-data-and-dropout-effects">Additional training data and dropout effects</h2>

<div align="center">
  
  <img src="/public/images/2021-12-01-albert/I10.png" alt="ALT" />
  
</div>

<p>Up until this point we have only considered 2 datasets, namely Wikipedia and BOOKCORPUS. But the figure above shows the performance when we add additional data used by both XLNet and RoBERTa. It is evident from the figure that adding data gives a significant boost to the dev set MLM accuracy.
But what is surprising is that even after training for 1M steps, the largest models do not overfit to their training data. So removing dropouts can further increase the capacity of the models which results in higher MLM accuracy as shown in the above figure. It is always said that adding combinations of batch normalization and dropout to CNNs can improve the model accuracy, but there is evidence which proves this theory wrong and shows that it may actually end up producing harmful results!</p>

<h2 id="conclusion">Conclusion</h2>

<p>ALBERT is successful in terms of reduction in the number of parameters by giving rise to powerful contextual representations, thereby giving significantly better results. However, due to its large structure, ALBERT is computationally more expensive than BERT. Many recent works have tackled this issue by including sparse and block attention.</p>

<p>That’s it folks! Hope this was a good and informative read.</p>

<h2 id="bibliography">Bibliography</h2>
<p>Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., &amp; Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.</p>

<p>Devlin, J., Chang, M. W., Lee, K., &amp; Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.</p>

<p>Medium. (2019. September 27). <em>Google’s ALBERT Is a Leaner BERT; Achieves SOTA on 3 NLP Benchmarks</em> https://medium.com/syncedreview/googles-albert-is-a-leaner-bert-achieves-sota-on-3-nlp-benchmarks-f64466dd583</p>

<p>Machinecurve. (2021. Januray 6). <em>ALBERT explained: A Lite BERT</em> https://www.machinecurve.com/index.php/2021/01/06/albert-explained-a-lite-bert/</p>
 -->
  </div>
  
  <div >
    <h1 class="post-title">
      <a href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/2021/09/08/blog-posts-as-conference-contributions/">
        Blog Posts as Conference Contributions
      </a>
    </h1>

    <span class="post-date">08 Sep 2021 | 
      <a class="content-tag" href="/tags/#proposal"> proposal </a>
        
      <a class="content-tag" href="/tags/#call"> call </a>
        
    </span>
    <span class="post-date">Bubeck, Sebastien, Microsoft; Dobre, David, Mila; Gauthier, Charlie, Mila; Gidel, Gauthier, Mila; Vernade, Claire, DeepMind</span>

    <!-- <h1 id="motivations">Motivations</h1>

<p>The Machine Learning community is currently experiencing a
<a href="https://neuripsconf.medium.com/designing-the-reproducibility-program-for-neurips-2020-7fcccaa5c6ad">reproducibility
crisis</a>
and a reviewing crisis <a href="#Litt">[Littman, 2021]</a>. Because of the highly competitive and noisy
reviewing process of ML conferences <a href="#Tran">[Tran et al., 2020]</a>, researchers have an incentive to
oversell their results, slowing down the progress and diminishing the
integrity of the scientific community. Moreover with the growing number
of papers published and submitted at the main ML conferences <a href="#Lin">[Lin et al., 2020]</a>, it has
become more challenging to keep track of the latest advances in the
field.</p>

<p>Blog posts are becoming an increasingly popular and useful way to talk
about science <a href="#Brow">[Brown and Woolston, 2018]</a>. They offer substantial value to the scientific community
by providing a flexible platform to foster open, human, and transparent
discussions about new insights or limitations of a scientific
publication. However, because they are not as recognized as standard
scientific publications, only a minority of researchers manage to
maintain an active blog and get visibility for their efforts. Many are
well-established researchers (<a href="https://francisbach.com/">Francis Bach</a>,
<a href="https://www.argmin.net/">Ben Recht</a>, <a href="https://www.inference.vc/">Ferenc
Huszár</a>, <a href="https://lilianweng.github.io/lil-log/">Lilian
Weng</a>) or big corporations that
leverage entire teams of graphic designers designer and writers to
polish their blogs (<a href="https://ai.facebook.com/blog/?page=1">Facebook AI</a>,
<a href="https://ai.googleblog.com/">Google AI</a>,
<a href="https://deepmind.com/blog">DeepMind</a>,
<a href="https://openai.com/blog/">OpenAI</a>). As a result, the incentives for
writing scientific blog posts are largely personal; it is unreasonable
to expect a significant portion of the machine learning community to
contribute to such an initiative when everyone is trying to establish
themselves through publications.</p>

<p>Our goal is to create a formal call for blog posts at ICLR to
incentivize and reward researchers to review past work and summarize the
outcomes, develop new intuitions, or highlight some shortcomings. A very
influential initiative of this kind happened after the second world war
in France. Because of the lack of up-to-date textbooks, a collective of
mathematicians under the pseudonym Nicolas Bourbaki <a href="#Halm">[Halmos 1957]</a>, decided to start a
series of textbooks  about the foundations of mathematics <a href="#Bour">[Bourbaki, 1939]</a>.
In the same vein, we aim at providing a new way to summarize scientific knowledge in the ML community.</p>

<h1 id="our-idea-blog-post-conference-track">Our Idea: Blog post Conference Track</h1>

<p>Due to the large diversity of topics that can be discussed in a blog
post, we decided to restrict the range of topics for this call for blog
posts. We identified that the blog posts that would bring to most value
to the community and the conference would be posts that distill and
discuss <em>previously published papers</em>.</p>

<h2 id="call-for-blog-posts-on-papers-previously-published-at-iclr">Call for blog posts on papers previously published at ICLR</h2>

<p>The call for blog post would take the following form:</p>

<ul>
  <li>
    <p>Write a post about a paper previously published at ICLR, with the
constraint that one cannot write a blog post on work that they have
a conflict of interest with. This implies that one cannot review
their own work, or work originating from their institution or
company. We want to foster productive discussion about <em>ideas</em>, and
prevent posts that intentionally aim to help or hurt individuals or
institutions.</p>
  </li>
  <li>
    <p>Blogs will be peer-reviewed (double-blind, see
Section <a href="#sub:sub_process" data-reference-type="ref" data-reference="sub:sub_process">2.5</a>)
for quality and novelty of the content: clarity and pedagogy of the
exposition, new theoretical or practical insights,
reproduction/extension of experiments, etc.</p>
  </li>
  <li>
    <p>The posts will be published under a unified template (see
Section <a href="#sub:sub_format" data-reference-type="ref" data-reference="sub:sub_format">2.4</a>
and
Section <a href="#sub:sub_process" data-reference-type="ref" data-reference="sub:sub_process">2.5</a>)
and hosted on the conference website or our own Github page.</p>
  </li>
</ul>

<h2 id="positive-impact-for-the-community">Positive Impact for the Community</h2>

<p>We believe having this call for blog posts as a conference track would
increase the posts’ visibility, impact, and credibility, while
simultaneously providing benefits to the conference.</p>

<ul>
  <li>
    <p><em>Adoption</em>: we think that, with the conference’s stamp, such a
format will be more broadly recognized and adopted by the community.</p>
  </li>
  <li>
    <p><em>Accessibility</em>: maintaining a blog is time consuming , and requires
many blog posts to gain a stable following. By allowing researchers
to publish a single post, we will permit occasional blog writers to
publish their ideas, something that is relatively impossible right
now. Moreover, it will make this format accessible to more
independent/junior blog writers that do not have a company or a
research lab to support them.</p>
  </li>
  <li>
    <p><em>Synchronization</em>: the fast evolving field of ML advances at the
paces of its conferences. By following the same pace the blog posts
will add value and momentum to the conference. It will benefit from
the same advantages of conferences with respect to scientific
journals: faster publication process and cross-fertilization of
ideas.</p>
  </li>
</ul>

<h2 id="positive-impact-for-the-conference">Positive Impact for the Conference</h2>

<p>We develop the potential positive impact of a blog post track for the
conference itself:</p>

<ul>
  <li>
    <p>Increases the value of the papers submitted to ICLR: blog posts will
discuss previously published papers, thus increasing their
visibility and quality.</p>
  </li>
  <li>
    <p>Incentivizes researchers to submit their best research to ICLR: high
quality work will likely get highlighted in future years in a blog
post.</p>
  </li>
  <li>
    <p>Improves reproducibility and transparency: the blog post track will
identify and publicly document pitfalls and “tricks” that were not
clearly communicated in the original publication.</p>
  </li>
  <li>
    <p>Provides a scientific value by itself: such blog posts will
reproduce and extend results of previously published papers. They
will distill important theoretical and practical ideas improving
their adoption and impact.</p>
  </li>
  <li>
    <p>Tests of time: this track will provide a sort of crowd-sourced test
of time at a shorter timescale than the current test of times
awards.</p>
  </li>
  <li>
    <p>Promotes accessibility: because many of this track’s blog posts will
vulgarize past content, this track will make the conference broadly
more accessible (to students, non-natives, and, more generally,
non-experts in the field).</p>
  </li>
</ul>

<h2 id="submission-format">Submission Format</h2>

<p>Our goal is to avoid heavily engineered, professionally-made
blog-posts—Such as the “100+ hours” mentioned as a standard by the <a href="https://distill.pub/journal/">Distill
  guidelines</a>—to entice ideas and clear writing rather than dynamic
visualizations or embedded javascript engines.</p>

<p>As a result, we restrict submissions to the Markdown format. We believe
this is a good trade-off between complexity and flexibility. Markdown
enables users to easily embed media such as images, gifs, audio, and
video as well as write mathematical equations using MathJax, without
requiring users to know how to create HTML web pages. This (mostly)
static format is also fairly portable; users can download the blog post
without much effort for offline reading or archival purposes. More
importantly, this format can be easily hosted and maintained through
GitHub.</p>

<h2 id="submission-process">Submission Process</h2>

<p>A full copy of the track’s blogs will always be publicly available as a
GitHub repository <a href="https://github.com/bourbaki-blogchain/bourbaki-blogchain.github.io">(mock-up
link)</a>.</p>

<p>The process for creating and submitting a blog post is as follows:</p>

<ol>
  <li>
    <p>Entrants will fork this repository and <strong>make their fork private</strong>.
 Failure to do so will result in the submission being rejected, as it
 breaches the double-blind review process.</p>
  </li>
  <li>
    <p>Users will modify their fork as they see fit; they will add their post
 along with any media files it might require. Since this is a full fork,
 they will be able to view their own copy of the blog. This means that
 they will be able to see exactly how their post will look and behave
 on the main website.</p>
  </li>
  <li>
    <p>Once completed, entrants will <strong>anonymize</strong> their blog post (i.e. strip their
 name, affiliation, etc).</p>
  </li>
  <li>
    <p>Entrants will download a ZIP of their <strong>anonymized</strong> fork (see figure
 below), and submit the ZIP to our OpenReview venue.</p>
  </li>
</ol>

<p><img src="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/public/images/download_zip.png" alt="Download instructions image" /></p>

<ol>
  <li>Once accepted, entrants will de-anonymize their post, make their fork
public again, and make a <em>Pull Request</em> on Github from their fork to the
main blog, allowing us to pull in their new blog post in a transparent
way.</li>
</ol>

<p>Once the submission period has ended, the GitHub repository of our track will
be temporarily made private for the duration of the conference, allowing the
conference to host the website. After the conference, the GitHub repository will
be made public again to allow viewers to fork and download its contents.</p>

<h2 id="the-potential-pitfalls-of-our-blog-post-track">The potential Pitfalls of our Blog Post Track</h2>

<p>In this section we identify potential issues arising with such a track
and explain how to mitigate them:</p>

<ol>
  <li>
    <p><em>Adversarial Blog Posts</em>: Since the guidelines are to write a blog
post on a previously published paper, one may expect some researcher
to try to use bad faith arguments to criticize a concurrent paper
through one of these blog post. We do not think this will happen,
because these blog posts will be public and thus researchers would
discredit themselves by using bad faith arguments.</p>
  </li>
  <li>
    <p><em>Too many/few submissions</em>: As this is a new track, it may be
difficult to predict the volume of submissions. The fact that there
are currently many independent blog posts on the web is a good
indicator that there will be positive interest. To get a better
estimate of the volume of potential submissions, we intend to
leverage social media to gauge the interest of the ML community in
such a track; this will allow us to gather a large enough reviewing
committee.</p>
  </li>
  <li>
    <p><em>Reviewing</em>: Once again as this is a new track, it may be unclear
how to judge blog posts during a review process. We will recruit a
large reviewing committee and define clear guidelines for the
reviewing process. Our primary focus will be on the originality of
the perspective and the novelty of the ideas, insights, and
experiments. For instance, posts that reuse less content from the
original paper (results, direct quotes) will be scored more
favourably than those that use more.</p>
  </li>
  <li>
    <p><em>Too many posts on the same paper</em>: We may mitigate this by only
selecting a small numbers of blog posts on the same paper. This
could actually be a strength since this can encourage discussion and
highlight different perspectives on the same work. Moreover, we
could explicitly state that we will have this hard limit (e.g.,
accepting a maximum of 3 blog posts on the same paper) to entice
researchers to submit blog posts on papers that have less
visibility.</p>
  </li>
</ol>

<h1 id="related-initiatives">Related Initiatives</h1>

<p>We mainly address our difference with respect to
<a href="https://distill.pub/">Distill</a>, the <a href="https://ml-retrospectives.github.io">ML Retrospectives
Workshop</a>, a Tutorial Track, and
other workshops discussing alternative formats for publications.</p>

<h4 id="distill">Distill.</h4>

<p>Created in 2016, <a href="https://distill.pub/">Distill</a> is an online scientific
journal based on blog post publications. We address our differences with
respect to Distill:</p>
<ul>
  <li>
    <p><em>Visualizations</em>: Blog posts should take advantage of the fact that
they’re not paperbound, and use innovative visualisations. But the
process of creating the intricate, dynamic visualisations associated
with Distill posts is a daunting for most authors. Creating blog
posts should be more easily accessible to newer authors and
researchers. Sometimes, being able to embed videos and gifs is
enough.</p>
  </li>
  <li>
    <p><em>Content</em>: Distill does not target the same type of content as our
track. Distill aims at presenting new research, and at making this
research more accessible. We want our blog post track to incentivize
researchers to revisit and discuss on other researcher’s works, in a
more natural way than scientific papers allow. Such a practice would
undoubtedly be useful for the community, both as a short-term “test
of time”, and also as a way to extract the key ideas from lengthy
articles.</p>
  </li>
  <li>
    <p><em>Limited adoption by the community</em>: we believe that since Distill
is not associated with a big conference track, its widespread
adoption is hindered. This lack of association confines it to a
small subset of the community that is already familiar with blog
posts.</p>
  </li>
  <li>
    <p><em>Leveraging the momentum of the conference</em>: Distill describes
itself as a scientific journal. A large amount of the publications
in the ML community are conference papers. A blog post track that
follows conferences would be better suited to follow the pace of the
community.</p>
  </li>
</ul>

<h4 id="ml-retrospective-workshop">ML-Retrospective Workshop.</h4>

<p>A recurrent workshop in the ML community is the <a href="https://ml-retrospectives.github.io">ML Retrospectives
Workshop</a> (NeurIPS 2019, 2020 and
ICML 2020). This workshop is a venue for researchers to talk about their
previous work in a more open and transparent way. More precisely,
emphasis has recently been put on addressing:</p>

<ul>
  <li>
    <p>Flaws or mistakes in the paper’s methodology</p>
  </li>
  <li>
    <p>Limitations in the applicability of the work</p>
  </li>
  <li>
    <p>Changes in understanding or intuition</p>
  </li>
</ul>

<p>We share the ultimate goal of “making research more human”, but with a
completely different format. We believe that the constraint to write
about someone else’s work using natural language will channel fruitful
discussions and provide more visibility to previously published papers.</p>

<h4 id="tutorial-track">Tutorial Track.</h4>

<p>We believe that our proposed blog post track differentiates itself from
a tutorial track because tutorials operate at different scales. On the
one hand, a tutorial regarding a whole topic (e.g. GANs, adversarial
examples, Random matrix theory in ML) contains a long talk, slides, and
potentially exercises to get familiar with the topics. It is usually
made by a team of expert researchers on the topic. On the other hand,
the call for blog posts we propose focuses on a single publication. It
regards a single paper that can concern a more precise and recent topic
(e.g., a specific paper that addresses mode collapse on GANs, a novel
technique to perform adversarial training, etc.) and could be written by
a single researcher (once again making it more accessible to junior
researchers).</p>

<h4 id="previous-workshops-on-rethinking-publication-formats">Previous workshops on rethinking publication formats.</h4>

<p>Recently, the <a href="https://rethinkingmlpapers.github.io/">Rethinking ML Papers
Workshop</a> at ICLR 2021 fuelled
the discussion (see references therein for related past workshops). The
presenters discussed the importance of accessibility, web
demonstrations, visualization and blog posts (among others). One
particularly related discussion was the <a href="https://slideslive.com/38956531/beyond-static-papers-rethinking-how-we-share-scientific-understanding-in-ml">talk by Lilian Weng
(time=4h25mins)</a>
on the usefulness of blog posts to get up-to-date with the field of ML.</p>

<p>In alignment with these initiatives, this new track is another step in
the direction of making research more human.</p>

<h3 id="bibliography">bibliography</h3>
<p><a name="Litt">Michael L Littman. Collusion rings threaten the integrity of computer science research. Communications of the ACM, 2021.</a></p>

<p><a name="Tran">David Tran, Alex Valtchanov, Keshav Ganapathy, Raymond Feng, Eric Slud, Micah Goldblum, and Tom Goldstein. An open review of openreview: A critical analysis of the machine learning conference review process. arXiv, 2020. </a></p>

<p><a name="Lin">Hsuan-Tien Lin, Maria-Florina Balcan, Raia Hadsell, and Marc’Aurelio Ranzato. What we learned from neurips2020 reviewing process. Medium https://medium.com/@NeurIPSConf/what-we-learned-from-neurips-2020-reviewing-process-e24549eea38f, 2020. </a></p>

<p><a name="Brow">Eryn Brown and Chris Woolston. Why science blogging still matters. Nature, 2018.</a></p>

<p><a name="Halm">Paul R Halmos. Nicolas bourbaki. Scientific American, 1957.<a></a></a></p>

<p><a name="Bour">Nicolas Bourbaki. Elements of mathematics. Éditions Hermann, 1939.</a></p>

 -->
  </div>
  
  <div >
    <h1 class="post-title">
      <a href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/2020/04/03/even-has-latex/">
        Even has Latex!
      </a>
    </h1>

    <span class="post-date">03 Apr 2020 | 
      <a class="content-tag" href="/tags/#test"> test </a>
        
      <a class="content-tag" href="/tags/#tutorial"> tutorial </a>
        
      <a class="content-tag" href="/tags/#markdown"> markdown </a>
        
      <a class="content-tag" href="/tags/#latex"> latex </a>
        
    </span>
    <span class="post-date">Doe, John Sr., School of Hard Knocks</span>

    <!-- <h2 id="how-to-add-latex-commands-to-your-posts">How to add $\LaTeX$ commands to your posts:</h2>

<h3 id="inline">Inline</h3>

<p>To add inline math, you can use <code class="language-plaintext highlighter-rouge">$ &lt;math&gt; $</code>. Here is an example:</p>

<p><code class="language-plaintext highlighter-rouge">$ \sum_{i=0}^j \frac{1}{2^n} \times i $</code> becomes
$ \sum_{i=0}^j \frac{1}{2^n} \times i $</p>

<h3 id="block">Block</h3>

<p>To add block math, you <em>must</em> use <code class="language-plaintext highlighter-rouge">$$&lt;math&gt;$$</code>. Here are some examples:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$$\begin{equation}
a \times b \times c = 0 \\
j=1 \\
k=2 \\
\end{equation}$$
</code></pre></div></div>

<p>…becomes…</p>

\[\begin{equation}
a \times b \times c = 0 \\
j=1 \\
k=2 \\
\end{equation}\]

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$$\begin{align}
i2 \times b \times c =0 \\
j=1 \\
k=2 \\
\end{align}$$
</code></pre></div></div>

<p>…becomes…</p>

\[\begin{align}
i2 \times b \times c =0 \\
j=1 \\
k=2 \\
\end{align}\]

<p>Don’t forget the enclosing <code class="language-plaintext highlighter-rouge">$$</code>! Otherwise, your newlines won’t work:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>\begin{equation}
i2=0 \\
j=1 \\
k=2 \\
\end{equation}
</code></pre></div></div>

<p>…becomes…</p>

<p>\begin{equation}
i2=0 <br />
j=1 <br />
k=2 <br />
\end{equation}</p>
 -->
  </div>
  
  <div >
    <h1 class="post-title">
      <a href="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/2020/04/02/example-content/">
        Example content (Images and Assets)
      </a>
    </h1>

    <span class="post-date">02 Apr 2020 | 
      <a class="content-tag" href="/tags/#test"> test </a>
        
      <a class="content-tag" href="/tags/#tutorial"> tutorial </a>
        
      <a class="content-tag" href="/tags/#markdown"> markdown </a>
        
    </span>
    <span class="post-date">Doe, John, School of Life; Doe, Jane, A School</span>

    <!-- <div class="message">
  Howdy! This is an example blog post that shows several types of HTML content supported in this theme.
</div>

<p>Cum sociis natoque penatibus et magnis <a href="#">dis parturient montes</a>, nascetur ridiculus mus. <em>Aenean eu leo quam.</em> Pellentesque ornare sem lacinia quam venenatis vestibulum. Sed posuere consectetur est at lobortis. Cras mattis consectetur purus sit amet fermentum.</p>

<blockquote>
  <p>Curabitur blandit tempus porttitor. Nullam quis risus eget urna mollis ornare vel eu leo. Nullam id dolor id nibh ultricies vehicula ut id elit.</p>
</blockquote>

<p>Etiam porta <strong>sem malesuada magna</strong> mollis euismod. Cras mattis consectetur purus sit amet fermentum. Aenean lacinia bibendum nulla sed consectetur.</p>

<h2 id="inline-html-elements">Inline HTML elements</h2>

<p>HTML defines a long list of available inline tags, a complete list of which can be found on the <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element">Mozilla Developer Network</a>.</p>

<ul>
  <li><strong>To bold text</strong>, use <code class="language-plaintext highlighter-rouge">&lt;strong&gt;</code>.</li>
  <li><em>To italicize text</em>, use <code class="language-plaintext highlighter-rouge">&lt;em&gt;</code>.</li>
  <li>Abbreviations, like <abbr title="HyperText Markup Langage">HTML</abbr> should use <code class="language-plaintext highlighter-rouge">&lt;abbr&gt;</code>, with an optional <code class="language-plaintext highlighter-rouge">title</code> attribute for the full phrase.</li>
  <li>Citations, like <cite>— Mark otto</cite>, should use <code class="language-plaintext highlighter-rouge">&lt;cite&gt;</code>.</li>
  <li><del>Deleted</del> text should use <code class="language-plaintext highlighter-rouge">&lt;del&gt;</code> and <ins>inserted</ins> text should use <code class="language-plaintext highlighter-rouge">&lt;ins&gt;</code>.</li>
  <li>Superscript <sup>text</sup> uses <code class="language-plaintext highlighter-rouge">&lt;sup&gt;</code> and subscript <sub>text</sub> uses <code class="language-plaintext highlighter-rouge">&lt;sub&gt;</code>.</li>
</ul>

<p>Most of these elements are styled by browsers with few modifications on our part.</p>

<h2 id="heading">Heading</h2>

<p>Vivamus sagittis lacus vel augue rutrum faucibus dolor auctor. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.</p>

<h3 id="code">Code</h3>

<p>Cum sociis natoque penatibus et magnis dis <code class="language-plaintext highlighter-rouge">code element</code> montes, nascetur ridiculus mus.</p>

<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="c1">// Example can be run directly in your JavaScript console</span>

<span class="c1">// Create a function that takes two arguments and returns the sum of those arguments</span>
<span class="kd">var</span> <span class="nx">adder</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Function</span><span class="p">(</span><span class="dl">"</span><span class="s2">a</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">b</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">return a + b</span><span class="dl">"</span><span class="p">);</span>

<span class="c1">// Call the function</span>
<span class="nx">adder</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">);</span>
<span class="c1">// &gt; 8</span></code></pre></figure>

<p>Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa.</p>

<h3 id="lists">Lists</h3>

<p>Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p>

<ul>
  <li>Praesent commodo cursus magna, vel scelerisque nisl consectetur et.</li>
  <li>Donec id elit non mi porta gravida at eget metus.</li>
  <li>Nulla vitae elit libero, a pharetra augue.</li>
</ul>

<p>Donec ullamcorper nulla non metus auctor fringilla. Nulla vitae elit libero, a pharetra augue.</p>

<ol>
  <li>Vestibulum id ligula porta felis euismod semper.</li>
  <li>Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.</li>
  <li>Maecenas sed diam eget risus varius blandit sit amet non magna.</li>
</ol>

<p>Cras mattis consectetur purus sit amet fermentum. Sed posuere consectetur est at lobortis.</p>

<dl>
  <dt>HyperText Markup Language (HTML)</dt>
  <dd>The language used to describe and define the content of a Web page</dd>

  <dt>Cascading Style Sheets (CSS)</dt>
  <dd>Used to describe the appearance of Web content</dd>

  <dt>JavaScript (JS)</dt>
  <dd>The programming language used to build advanced Web sites and applications</dd>
</dl>

<p>Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Nullam quis risus eget urna mollis ornare vel eu leo.</p>

<h3 id="tables">Tables</h3>

<p>Aenean lacinia bibendum nulla sed consectetur. Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>

<table>
  <thead>
    <tr>
      <th>Name</th>
      <th>Upvotes</th>
      <th>Downvotes</th>
    </tr>
  </thead>
  <tfoot>
    <tr>
      <td>Totals</td>
      <td>21</td>
      <td>23</td>
    </tr>
  </tfoot>
  <tbody>
    <tr>
      <td>Alice</td>
      <td>10</td>
      <td>11</td>
    </tr>
    <tr>
      <td>Bob</td>
      <td>4</td>
      <td>3</td>
    </tr>
    <tr>
      <td>Charlie</td>
      <td>7</td>
      <td>9</td>
    </tr>
  </tbody>
</table>

<p>Nullam id dolor id nibh ultricies vehicula ut id elit. Sed posuere consectetur est at lobortis. Nullam quis risus eget urna mollis ornare vel eu leo.</p>

<hr />

<h1 id="images-gifs-and-assets">Images, gifs, and assets</h1>

<p>If you include an hosted elsewhere on the web, the process is trivial. Simply use the standard GitHub-flavoured-MarkDown syntax.</p>

<p><code class="language-plaintext highlighter-rouge">![Example Image](https://iclr.cc/static/core/img/ICLR-logo.svg)</code> becomes:</p>

<p><img src="https://iclr.cc/static/core/img/ICLR-logo.svg" alt="Example Image" />
(be wary of copyrights).</p>

<p>However, if your image must be hosted locally, it’s a bit more touchy. You must add the site’s URL (use the <code class="language-plaintext highlighter-rouge">https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368</code> syntax).</p>

<p><code class="language-plaintext highlighter-rouge">![Download instructions image](https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/public/images/download_zip.png)</code> becomes:
<img src="https://iclr.iro.umontreal.ca/fe213953-21c6-4bda-8eb6-9d0e3543ae2d_1641910368/public/images/download_zip.png" alt="Download instructions image" /></p>
 -->
  </div>
  
</div>

<div class="pagination">
  
    <span class="pagination-item older">Older</span>
  
  
    <span class="pagination-item newer">Newer</span>
  
</div>

      </div>
    </div>

    <label for="sidebar-checkbox" class="sidebar-toggle"></label>

    <script src='/public/js/script.js'></script>
  </body>
</html>
