<!DOCTYPE html>
<html lang="en-us">

  <head>
  <link href="http://gmpg.org/xfn/11" rel="profile">
  <meta http-equiv="content-type" content="text/html; charset=utf-8">

  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">

  <title>
    
      MODALS: Data augmentation that works for everyone &middot; The ICLR Blog Track
    
  </title>

  
  <link rel="canonical" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/2021/09/01/BlogTitle/">
  

  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/css/poole.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/css/syntax.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/css/lanyon.css">
  <link rel="stylesheet" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/css/custom.css">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=PT+Serif:400,400italic,700%7CPT+Sans:400">

  <link rel="apple-touch-icon-precomposed" sizes="144x144" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/apple-touch-icon-precomposed.png">
  <link rel="shortcut icon" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/favicon.ico">

  <link rel="alternate" type="application/rss+xml" title="RSS" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/atom.xml">

  

  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript" ></script>
 <!-- <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { equationNumbers: { autoNumber: "AMS" } } }); </script> -->
  <script type="text/x-mathjax-config">
      MathJax.Hub.Config({
        tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
         processEscapes: false
        }
      });
</script>
</head>


  <body>

    <!-- Target for toggling the sidebar `.sidebar-checkbox` is for regular
     styles, `#sidebar-checkbox` for behavior. -->
<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox">
<!-- <input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" > -->

<!-- Toggleable sidebar -->
<div class="sidebar" id="sidebar">
  <div class="sidebar-item">
    <p>For short-term, peer-sourced tests of time, generalizations, specializations, reproductions, etc.!</p>
  </div>

  <nav class="sidebar-nav">

    

    
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/">ICLR 2022 Blog Track</a>
        
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/about/">About</a>
        
      
    
      
    
      
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/submitting/">Submitting</a>
        
      
    
      
        
          <a class="sidebar-nav-item" href="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/tags/">Tags</a>
        
      
    

    <a class="sidebar-nav-item" href="https://github.com/iclr-blog-track/iclr-blog-track.github.io">GitHub project</a>
    <span class="sidebar-nav-item">Currently vICLR Spring 2021</span>
  </nav>

  <div class="sidebar-item">
    <p>
      &copy; 2022. All rights reserved.
    </p>
  </div>
</div>


    <!-- Wrap is the content to shift when toggling the sidebar. We wrap the
         content to avoid any CSS collisions with our real content. -->
    <div class="wrap">
      <div class="masthead">
        <div class="container">
          <h3 class="masthead-title">
            <a href="/" title="Home">The ICLR Blog Track</a>
            <small></small>
          </h3>
        </div>
      </div>

      <div class="container content">
        <div class="post">
  <h1 id="iclr-post-title" class="post-title">MODALS: Data augmentation that works for everyone</h1>
  <span class="post-date">01 Sep 2021 | 
    <a class="content-tag" href="/tags/#deep-learning"> deep learning </a>
  
    <a class="content-tag" href="/tags/#data-augmentation"> data augmentation </a>
  
    <a class="content-tag" href="/tags/#latent-space"> latent space </a>
  
    <a class="content-tag" href="/tags/#data-modalities"> data modalities </a>
  
    <a class="content-tag" href="/tags/#automated-data-augmentation"> automated data augmentation </a>
  </span>

  <span id="iclr-post-authors" class="post-date">Anonymous</span>
  <p><br />
In this blog we discuss the paper “MODALS: Modality- agnostic Automated Data Augmentation in the Latent Space”[1]
<br />
<br />
Data augmentation is important for all types of deep learning tasks: unsupervised, supervised and self-supervised. Data augmentation has almost become a prerequisite for deep learning. In fact many self-supervised learning tasks [2] use data augmentation as a core component of the learning algorithm. It is also heavily used in image models and has consistently produced good results. By perturbing the data during training we are able to train more robust models, especially when there is limited data. Data augmentation is also very useful in incorporating known priors or domain knowledge into the model. For example the random flip transform applied to image data implicitly encodes the prior knowledge we have that the data is invariant to changes in orientation.
<br />
<br />
The usefulness of data augmentation has led to the development of specific techniques of augmentation unique to each modality of data. The techniques developed for one modality usually suit the type of data in that particular modality. For image data some commonly used augmentation techniques are rotation, cropping, applying affine transforms, random flips, contrast and color augmentations and adding Gaussian blur. More recent techniques like CutMix[3] and MixUp[4] apply data augmentation on both the image and label space. Many robust data augmentation techniques for image data already exist and are used widely. However modalities like tables and graph data don’t have as many robust augmentation techniques. Since the techniques developed for images were made taking into consideration the nature of image data, they usually cannot be applied to other modalities. If a generalized, modality agnostic framework for augmentation could be developed then standard, robust augmentation techniques can be applied across many modalities. This is exactly what the authors of the paper propose.
<br />
<br />
In order to be independent of modality, such a framework would have to convert data from different modalities to a common representation on which label preserving transforms can be applied. The authors point out that there exist techniques of data augmentation that use auto encoders to learn a latent space representation [5]. This is then used to generate augmentation data for the downstream task. In the framework proposed by the paper the learning of the latent space is integrated with the downstream task. A model is trained for the downstream task and an intermediate layer encodes the representations of the input data in the latent space. The framework uses interpolation and extrapolation of existing data points to produce augmented data. This requires that the latent space learned must be smooth (the representations cannot vary abruptly for a minute change in input features) for transformations to be label preserving and produce valid data. The smoothness ensures that the augmented data is inside the data manifold (ensuring it is valid data) and has the same label as the points used to generate it. To ensure this property the authors add two additional losses to the encoder used to learn the latent space. These are the adversarial loss and the triplet loss.
<br />
<br /></p>
<h3>Adversarial Loss</h3>
<p>To achieve a smooth latent space data manifold we use a discriminator network and add an adversarial loss to the model. The discriminator, given a data point, tries to predict if it is from the latent space or is a randomly sampled point from a Gaussian distribution. To keep the adversarial loss low (to “fool” the discriminator network) the model must also learn a smooth data representation so as to be indistinguishable from a smooth Gaussian distribution.
<br />
<br /></p>
<h3>Triplet Loss</h3>
<p>The aim is to have the representations of all the data points of a class to be close to each other and far away from data points from other classes. To learn the latent space in this way we add the triplet loss to the model. The Triplet loss pulls together samples from the same class and pushes away samples from other classes.
<br />
<br />
<br />
<br /></p>
<p align="center">
<br />
<img src="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/images/BlogTitle/PaperFig2.png" alt="Figure 2 from the paper" width="800" />
</p>
<h3 align="center">
Figure: The MODALS framework (figure from paper)
</h3>
<p><br /></p>
<h2>Transformations in the latent space</h2>
<p>Once the representations of points have been found, the authors propose the following transforms in the latent space for data augmentation:
<br /></p>
<h3>Hard example interpolation</h3>
<p>For each sample in the training set, the $q$ nearest hard examples of that class are found. Then the data point is interpolated towards the nearest hard example to get a new augmented data point.
<br />
In the figure below, the data point being considered is represented by $\vec{\,a}$. $q1$ through $q5$ are 5 hard examples of the same class as $\vec{\,a}$ and $q3$ (represented by $\vec{\,b}$) is selected as it is the nearest to $\vec{\,a}$ among the five. $\vec{\,a}$ is interpolated towards $\vec{\,b}$ with a scaling factor $\lambda$ applied.
<br /></p>
<p align="center">
<img src="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/images/BlogTitle/HardInterFig.png" alt="Illustration of hard example interpolation" width="800" />
</p>
<h3 align="center">
Figure: Illustration of hard example interpolation
</h3>
<p><br /></p>
<h3>Hard example extrapolation</h3>
<p>Similar to interpolation, each sample in the training set is considered and then shifted away from the class center to get the augmented data point.
<br />
In the figure, the data point being considered is represented by $\vec{\,a}$. The class center of $\vec{\,a}$ is represented by $\vec{\,b}$. $\vec{\,a}$ is extrapolated away from $\vec{\,b}$ by a factor $\lambda$.
<br /></p>
<p align="center">
<img src="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/images/BlogTitle/HardExtraFig.png" alt="Illustration of hard example extrapolation" width="800" />
</p>
<h3 align="center">
Figure: Illustration of hard example extrapolation
</h3>
<p><br /></p>
<h3>Gaussian Noise</h3>
<p>Gaussian noise is added to data points to get augmented data. The mean of the Gaussian noise is zero and the standard deviation is calculated on a per class basis.
<br />
In the figure below, $\vec{\,a}$ is the data point being considered and $\vec{\,b}$, $\vec{\,c}$ and $\vec{\,d}$ are three points to which a may be transformed after adding Gaussian noise. The Gaussian noise which is scaled by a factor $\lambda$ and added is shown as a circle.
<br /></p>
<p align="center">
<img src="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/images/BlogTitle/GaussianFig.png" alt="Illustration of addition of Gaussian noise" width="800" />
</p>
<h3 align="center">
Figure: Illustration of addition of Gaussian noise
</h3>
<p><br /></p>
<h3>Difference Transform</h3>
<p>The rationale behind the difference transform is that labels should be independent of differences in features within a class. The difference between two samples from the same class is added to a data point to get an augmented instance of data. Adding the difference in features to the point should not change its label.
<br />
The figure shows $\vec{\,a}$ as the data point being considered and $\vec{\,b}$ and $\vec{\,c}$ are two randomly sampled points from the same class as $\vec{\,a}$. The difference of $\vec{\,b}$ and $\vec{\,c}$ is added to $\vec{\,a}$ to get the augmented point.
<br /></p>
<p align="center">
<img src="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/images/BlogTitle/DiffTransform.png" alt="Illustration of the difference transform" width="800" />
</p>
<h3 align="center">
Figure: Illustration of the difference transform
</h3>
<p><br />
<br /></p>
<h2>Population Based Augmentation (PBA)</h2>
<p>Instead of fixing the set of augmentations to be applied to latent representations at all stages of training, the authors use a technique called PBA[6] to find optimized sets of augmentations to be applied at different stages. A few terms that PBA uses are defined below.
<br />
<br />
An augmentation function is defined as a 3-tuple consisting of a transformation, the probability of its application and its magnitude. An example given in the paper is that (op: rotation, p = 0.4, lambda = 0.5) is an augmentation function that applies the rotation transformation with a probability of 0.4 and a magnitude of 0.5.
<br />
A policy is defined as the set of all 3-tuples which are to be applied to the dataset. 0, 1 or 2 tuples are randomly sampled from the current policy and applied to each mini-batch.
<br />
A policy schedule is the order of policies which will be applied at different stages of training.
<br />
<br />
PBA searches for a good policy schedule which will give us optimized augmentation hyperparameters to be applied at different stages of training. PBA exploits population based training which is a method in which multiple child models are trained simultaneously to learn an optimal policy schedule. At the beginning each child is assigned a randomly sampled policy from the set of all possible policies (corresponding to all possible values of probability and magnitude).The child models are initialized with random weights and are trained for a fixed number of steps following which they are evaluated using the validation set. The evaluation reveals the worst and best of the models being trained. The worst models copy the weights and policies learned by the best models. The parameters (probability and magnitude) of the copied policies are then perturbed slightly or resampled from the set of all possible values. This is done to avoid creating duplicate models. This sequence of steps are repeated until the child models are done training. At the end the best child model is selected and its policy schedule is used.
<br />
There is reason to believe that this approach for finding a good policy schedule will work. The authors of [6] state that “… though the space of schedules is large, most good schedules are necessarily smooth and hence easily discoverable through evolutionary search algorithms such as PBT.”
<br /></p>
<p align="center">
<img src="https://iclr.iro.umontreal.ca/4d4c96ed-5fc6-4d4b-a563-9c1d5d768eb3_1642246748/public/images/BlogTitle/PBAFig.png" alt="The sequence of steps of PBA" width="800" />
</p>
<h3 align="center">
Figure: The sequence of steps of PBA
</h3>
<p><br />
<br /></p>
<h2>Closing Thoughts</h2>
<p>In their experiments the authors applied the MODALS framework on text, tabular, time-series and image data. In general the framework works well across modalities except in the imaging modality where they found that using PBA with image specific data augmentation techniques performed better than MODALS. While MODALS is a powerful technique for data augmentation, domain specific augmentation techniques could still be used in certain contexts like image data. Domain knowledge and insights about the data can be incorporated through hand crafted augmentations but not in MODALS. As the authors themselves note, one possible approach is to use a combination of both, using hand crafted augmentations to incorporate domain knowledge and insights into the augmentation.
<br />
<br />
The framework proposed in the paper builds around classification tasks where each data point has a single label. There is potential for a technique like MODALS to be extended to other tasks like regression and multi label classification.
<br />
<br /></p>
<h2>References</h2>
<p>[1] Cheung, Tsz-Him, and Dit-Yan Yeung. “MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space.” International Conference on Learning Representations. 2020.
<br />
[2] He, Kaiming, et al. “Momentum contrast for unsupervised visual representation learning.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
<br />
[3] Yun, Sangdoo, et al. “Cutmix: Regularization strategy to train strong classifiers with localizable features.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
<br />
[4] Liu, Zicheng, et al. “AutoMix: Unveiling the Power of Mixup.” arXiv preprint arXiv:2103.13027 (2021).
<br />
[5] DeVries, Terrance, and Graham W. Taylor. “Dataset augmentation in feature space.” arXiv preprint arXiv:1702.05538 (2017).
<br />
[6] Ho, Daniel, et al. “Population based augmentation: Efficient learning of augmentation policy schedules.” International Conference on Machine Learning. PMLR, 2019.
<br />
<br />
Images created using: The Manim Community Developers. (2022). Manim – Mathematical Animation Framework (Version v0.14.0) [Computer software]. https://www.manim.community/</p>

</div>

<div id="bibtex-container" class="related">
  For attribution in academic contexts, please cite this work as
  <pre id="bibtex-academic-attribution">

  </pre>

  BibTeX citation
  <pre id="bibtex-box">

  </pre>
</div>
<script>
  let authorsSpan = document.getElementById("iclr-post-authors");
  let authorsText = authorsSpan.textContent;
  let lnameFnameInstitution = authorsText.split(";");
  let lfiList = lnameFnameInstitution.map(lfi => lfi.split(",").map(item => item.trim()));
  let bibtexLFI = lfiList.map(lfi => lfi[0] + ", " + lfi[1]).join(" and ")
  let academicLFI = lfiList.map(lfi => lfi[0]);
  {
    if(academicLFI.length > 2) academicLFI = academicLFI[0] + ", et al.";
    else if(academicLFI.length == 2) academicLFI = academicLFI[0] + " & " + academicLFI[1];
    else academicLFI = academicLFI[0];
  }

  let titleSpan = document.getElementById("iclr-post-title");
  let titleText = titleSpan.textContent.trim();
  let bibtexTitleShorthand = (lfiList[0][1]+
    "2022"+
    titleText.split(" ").slice(0, 3).join("")
  ).replace(" ", "").replace(/[\p{P}$+<=>^`|~]/gu, '').toLowerCase().trim();

  let bibtexTemplate = `
@inproceedings{${bibtexTitleShorthand}},
  author = {${bibtexLFI}},
  title = {${titleText}},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {${window.location.href}},
  url  = {${window.location.href}}
}
  `.trim();
  document.getElementById("bibtex-box").innerText = bibtexTemplate;

  let academicTemplate = `
${academicLFI}, "${titleText}", ICLR Blog Track, 2022.
`.trim();
  document.getElementById("bibtex-academic-attribution").innerText = academicTemplate;

</script>


<div class="related">
  <h2>Related posts</h2>
  <ul class="related-posts">
    
      <li>
        <h3>
          <a href="/2021/09/01/sample-submission/">
            Sample Submission
            <small>01 Sep 2021 | 
    <a class="content-tag" href="/tags/#deep-learning"> deep learning </a>
  
    <a class="content-tag" href="/tags/#data-augmentation"> data augmentation </a>
  
    <a class="content-tag" href="/tags/#latent-space"> latent space </a>
  
    <a class="content-tag" href="/tags/#data-modalities"> data modalities </a>
  
    <a class="content-tag" href="/tags/#automated-data-augmentation"> automated data augmentation </a>
  </small>
          </a>
        </h3>
      </li>
    
      <li>
        <h3>
          <a href="/2020/04/02/example-content/">
            Example content (Basic Markdown)
            <small>02 Apr 2020 | 
    <a class="content-tag" href="/tags/#deep-learning"> deep learning </a>
  
    <a class="content-tag" href="/tags/#data-augmentation"> data augmentation </a>
  
    <a class="content-tag" href="/tags/#latent-space"> latent space </a>
  
    <a class="content-tag" href="/tags/#data-modalities"> data modalities </a>
  
    <a class="content-tag" href="/tags/#automated-data-augmentation"> automated data augmentation </a>
  </small>
          </a>
        </h3>
      </li>
    
  </ul>
</div>


<script src="https://utteranc.es/client.js"
        repo="iclr-blog-track/iclr-blog-track.github.io"
        issue-term="pathname"
        label="utterance"
        theme="boxy-light"
        crossorigin="anonymous"
        >
</script>


      </div>
    </div>

    <label for="sidebar-checkbox" class="sidebar-toggle"></label>

    <script src='/public/js/script.js'></script>
  </body>
</html>
