
<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />

    <title>1. Introduction &#8212; BiBench  documentation</title>
    <link rel="stylesheet" type="text/css" href="_static/pygments.css" />
    <link rel="stylesheet" type="text/css" href="_static/alabaster.css" />
    <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
    <script src="_static/jquery.js"></script>
    <script src="_static/underscore.js"></script>
    <script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
    <script src="_static/doctools.js"></script>
    <script src="_static/sphinx_highlight.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="prev" title="&lt;no title&gt;" href="index.html" />
   
  <link rel="stylesheet" href="_static/custom.css" type="text/css" />
  
  
  <meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9" />

  </head><body>
  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          

          <div class="body" role="main">
            
  <section id="introduction">
<h1><span class="section-number">1. </span>Introduction<a class="headerlink" href="#introduction" title="Permalink to this heading">¶</a></h1>
<p><strong>BiBench: Benchmarking and Analyzing Network Binarization</strong></p>
<p><strong>Abstract.</strong> Neural network binarization is one of the most promising
compression approaches with extraordinary computation and memory savings
by minimizing the bit-width of weight and activation. However, despite
being a general technique, recent works reveal that applying
binarization in various practical scenarios, including multiple tasks,
architectures, and hardware, is not trivial. Moreover, common
challenges, such as severe degradation in accuracy and limited
efficiency gains, suggest that specific attributes of binarization are
not thoroughly studied and adequately understood. To comprehensively
understand binarization methods, we present <strong>BiBench</strong>, a carefully
engineered benchmark with in-depth analysis for network binarization. We
first inspect the requirements of binarization in the actual production
setting. Then for the sake of fairness and systematic, we define the
evaluation tracks and metrics. We also perform a comprehensive
evaluation with a rich collection of milestone binarization algorithms.
Our benchmark results show that binarization still faces severe accuracy
challenges, and newer state-of-the-art binarization algorithms bring
diminishing improvements, even at the expense of efficiency. Moreover,
the actual deployment of certain binarization operations reveals a
surprisingly large deviation from their theoretical consumption.
Finally, based on our benchmark results and analysis, we suggest
establishing a paradigm for accurate and efficient binarization among
existing techniques. We hope BiBench paves the way toward more extensive
adoption of network binarization and serves as a fundamental work for
future research.</p>
<p><em>Note: we are continuously integrating and polishing this repository and
will publish a stable version upon acceptance.</em></p>
</section>
<section id="installation">
<h1><span class="section-number">2. </span>Installation<a class="headerlink" href="#installation" title="Permalink to this heading">¶</a></h1>
<section id="environment-preparation">
<h2><span class="section-number">2.1. </span>Environment Preparation<a class="headerlink" href="#environment-preparation" title="Permalink to this heading">¶</a></h2>
<ol class="loweralpha simple">
<li><p>Create a conda virtual environment and activate it.</p></li>
</ol>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>conda create -n bibench <span class="nv">python</span><span class="o">=</span><span class="m">3</span>.8 -y
conda activate bibench
</pre></div>
</div>
<ol class="loweralpha simple" start="2">
<li><p>Install PyTorch and torchvision following the <a class="reference external" href="https://pytorch.org/">official
instructions</a>.</p></li>
</ol>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>conda install <span class="nv">pytorch</span><span class="o">={</span>torch_version<span class="o">}</span> torchvision <span class="nv">cudatoolkit</span><span class="o">={</span>cu_version<span class="o">}</span> -c pytorch
</pre></div>
</div>
<p>E.g., install PyTorch 1.8.0 &amp; CUDA 10.2.</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>conda install <span class="nv">pytorch</span><span class="o">=</span><span class="m">1</span>.8.0 torchvision <span class="nv">cudatoolkit</span><span class="o">=</span><span class="m">10</span>.2 -c pytorch
</pre></div>
</div>
<p><strong>Important:</strong> Make sure that your compilation CUDA version and runtime
CUDA version match. Besides, for RTX 30 series GPU, cudatoolkit&gt;=11.0 is
required.</p>
<ol class="loweralpha simple" start="3">
<li><p>Install mmcv and other repositories for different tasks</p></li>
</ol>
<ul class="simple">
<li><p>mmcv-full</p></li>
</ul>
<p>We recommend you to install the pre-build package as below.</p>
<p>For CPU:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cpu/<span class="o">{</span>torch_version<span class="o">}</span>/index.html
</pre></div>
</div>
<p>Please replace <code class="docutils literal notranslate"><span class="pre">{torch_version}</span></code> in the url to your desired one.</p>
<p>For GPU:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>pip install <span class="s2">&quot;mmcv-full&gt;=1.3.17,&lt;=1.5.3&quot;</span> -f https://download.openmmlab.com/mmcv/dist/<span class="o">{</span>cu_version<span class="o">}</span>/<span class="o">{</span>torch_version<span class="o">}</span>/index.html
</pre></div>
</div>
<p>Please replace <code class="docutils literal notranslate"><span class="pre">{cu_version}</span></code> and <code class="docutils literal notranslate"><span class="pre">{torch_version}</span></code> in the url to
your desired one.</p>
<p>For example, to install mmcv-full with CUDA 10.2 and PyTorch 1.8.0, use
the following command:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>pip install <span class="s2">&quot;mmcv-full&gt;=1.3.17,&lt;=1.5.3&quot;</span> -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.8.0/index.html
</pre></div>
</div>
<p>See
<a class="reference external" href="https://mmcv.readthedocs.io/en/latest/get_started/installation.html">here</a>
for different versions of MMCV compatible to different PyTorch and CUDA
versions. For more version download link, refer to
<a class="reference external" href="https://download.openmmlab.com/mmcv/dist/index.html">openmmlab-download</a>.</p>
<p>Optionally you can choose to compile mmcv from source by the following
command</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>git clone https://github.com/open-mmlab/mmcv.git -b v1.5.3
<span class="nb">cd</span> mmcv
<span class="nv">MMCV_WITH_OPS</span><span class="o">=</span><span class="m">1</span> pip install -e .  <span class="c1"># package mmcv-full, which contains cuda ops, will be installed after this step</span>
<span class="c1"># OR pip install -e .  # package mmcv, which contains no cuda ops, will be installed after this step</span>
<span class="nb">cd</span> ..
</pre></div>
</div>
<p>Important: You need to run <code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">uninstall</span> <span class="pre">mmcv</span></code> first if you have mmcv
installed. If mmcv and mmcv-full are both installed, there will be
<code class="docutils literal notranslate"><span class="pre">ModuleNotFoundError</span></code>.</p>
<ul class="simple">
<li><p>mmcls</p></li>
</ul>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>pip install mmcls
</pre></div>
</div>
<ul class="simple">
<li><p>bipc, bispeech, binlp</p></li>
</ul>
<p>These repositories are now included in the source codes. You can move to
each directory and use <code class="docutils literal notranslate"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">-v</span> <span class="pre">-e</span> <span class="pre">.</span></code> to install them.</p>
<ul class="simple">
<li><p>mmdet (Optional)</p></li>
</ul>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>pip install mmdet
</pre></div>
</div>
</section>
<section id="data-preparation">
<h2><span class="section-number">2.2. </span>Data Preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading">¶</a></h2>
<p><strong>CIFAR-10 &amp; ImageNet</strong>. We follow the dataset usage of
<a class="reference external" href="https://mmclassification.readthedocs.io/en/latest/api/datasets.html">MMClassificationsation</a>
in this part. This implementation of CIFAR-10 is modified from this
<a class="reference external" href="https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py">link</a>.
Since the dataset ImageNet21k is extremely big, cantains 21k+ classes
and 1.4B files. This class has improved the following points on the
basis of the class <code class="docutils literal notranslate"><span class="pre">ImageNet</span></code>, in order to save memory, we enable the
<code class="docutils literal notranslate"><span class="pre">serialize_data</span></code> optional by default.</p>
<p><strong>Pascal VOC &amp; COCO</strong>. We follow the dataset usage of
<a class="reference external" href="https://github.com/open-mmlab/mmdetection/blob/master/docs/en/1_exist_data_model.md">MMDetection</a>
in this part. Public datasets like <a class="reference external" href="http://host.robots.ox.ac.uk/pascal/VOC/index.html">Pascal
VOC</a> or mirror and
<a class="reference external" href="https://cocodataset.org/#download">COCO</a> are available from official
websites or mirrors. Note: In the detection task, Pascal VOC 2012 is an
extension of Pascal VOC 2007 without overlap, and we usually use them
together.</p>
<p><strong>ModelNet40 and ShapeNet</strong>. The alignment ModelNet and ShapeNet can be
downloaded at
<a class="reference external" href="https://shapenet.cs.stanford.edu/media/modelnet40_normal_resampled.zip">link1</a>
and
<a class="reference external" href="https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip">link2</a>,
respectively, and then be saved in corresponding folders.</p>
<p><strong>GLUE</strong>. The original GLUE data can be accessed from this
<a class="reference external" href="https://gluebenchmark.com/tasks">link</a>. Put the original data
(<code class="docutils literal notranslate"><span class="pre">train.csv</span></code>, <code class="docutils literal notranslate"><span class="pre">dev.csv</span></code>) and the augmented data (named as
<code class="docutils literal notranslate"><span class="pre">train_${TASK_NAME}_aug_with_logits.csv</span></code>) to
<code class="docutils literal notranslate"><span class="pre">${GLUE_DIR}/${TASK_NAME}</span></code>.</p>
<p><strong>Speech Commands</strong>. The Google Speech Commands V1 dataset can be
downloaded in the reference document
<a class="reference external" href="https://pytorch.org/audio/stable/_modules/torchaudio/datasets/speechcommands.html#SPEECHCOMMANDS">link</a>.</p>
<p>The dataset directory should be like this.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>BiBench
├── data
│   ├── datasets
│   │   ├── cifar10
│   │   ├── imagenet
│   │   ├── VOCdevkit
│   │   ├── coco
│   │   ├── ModelNet40
│   │   ├── ShapeNet
│   │   ├── GLUE
│   │   ├── SpeechCommands
</pre></div>
</div>
</section>
<section id="training">
<h2><span class="section-number">2.3. </span>Training<a class="headerlink" href="#training" title="Permalink to this heading">¶</a></h2>
<p><strong>Training with a single / multiple GPUs</strong></p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>python tools/train.py ${CONFIG_FILE} ${WORK_DIR}
</pre></div>
</div>
<p>Example: using 1 GPU to train BiBench.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>python tools/train.py ${CONFIG_FILE} ${WORK_DIR} --gpus 1
</pre></div>
</div>
<p><strong>Training with Slurm</strong></p>
<p>If you can run BiBench on a cluster managed with
<a class="reference external" href="https://slurm.schedmd.com/">slurm</a>, you can use the script
<code class="docutils literal notranslate"><span class="pre">slurm_train.sh</span></code>.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM}
</pre></div>
</div>
<p>Common optional arguments include:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">--resume-from</span> <span class="pre">${CHECKPOINT_FILE}</span></code>: Resume from a previous
checkpoint file.</p></li>
</ul>
<p>Example: using 8 GPUs to train BiBench on a slurm cluster.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">./</span><span class="n">tools</span><span class="o">/</span><span class="n">slurm_train</span><span class="o">.</span><span class="n">sh</span> <span class="n">my_partition</span> <span class="n">my_job</span> <span class="n">configs</span><span class="o">/</span><span class="n">acc_cifar10</span><span class="o">/</span><span class="n">resnet18_bnn_adam_1e</span><span class="o">-</span><span class="mi">3</span><span class="n">_cosinelr</span><span class="o">.</span><span class="n">py</span> <span class="n">work_dirs</span><span class="o">/</span><span class="n">acc_cifar10</span> <span class="mi">8</span>
</pre></div>
</div>
<p>You can check <code class="docutils literal notranslate"><span class="pre">slurm_train.sh</span></code> for full arguments and environment
variables.</p>
</section>
<section id="add-custom-binarization-algorithms">
<h2><span class="section-number">2.4. </span>Add Custom Binarization Algorithms<a class="headerlink" href="#add-custom-binarization-algorithms" title="Permalink to this heading">¶</a></h2>
<p>With <strong>just 3 steps</strong>, researchers can define and evaluate custom
binarization algorithms easily in BiBench:</p>
<p><em>Step 1</em>. <strong>Operator defination</strong>: create a file for the custom
binarization algorithm under <code class="docutils literal notranslate"><span class="pre">bibench/models/layers</span></code>, and complete the
definition of binarized <code class="docutils literal notranslate"><span class="pre">Conv1d</span></code>, <code class="docutils literal notranslate"><span class="pre">Conv2d</span></code>, and <code class="docutils literal notranslate"><span class="pre">Linear</span></code> operators
in it.</p>
<p><em>Step 2</em>. <strong>Operator registration</strong>: register the binarized operators
defined in <em>Step 1</em> to <code class="docutils literal notranslate"><span class="pre">CONV_LAYERS</span></code> in
<code class="docutils literal notranslate"><span class="pre">bibench/models/layers/builder.py</span></code>.</p>
<p><em>Step 3</em>. <strong>Configuration definition</strong>: define the configuration for the
learning task, neural architecture, or any track you would like to
evaluate (existing configurations can be referred to).</p>
<p>Then you can get started with BiBench and evaluate your binarization
algorithm!</p>
</section>
</section>
<section id="binarization-algorithms">
<h1><span class="section-number">3. </span>Binarization Algorithms<a class="headerlink" href="#binarization-algorithms" title="Permalink to this heading">¶</a></h1>
<p><strong>BNN</strong>. During the training process, BNN uses the straight-through
estimator (STE) to calculate gradient <span class="math notranslate nohighlight">\(\boldsymbol{g_{x}}\)</span> which
takes into account the saturation effect:</p>
<div class="math notranslate nohighlight">
\[\begin{split}\mathtt{sign}(\boldsymbol{x})=
\begin{cases}
+1,&amp; \mathrm{if} \ \boldsymbol x \ge 0\\
-1,&amp; \mathrm{otherwise}
\end{cases}\qquad
\boldsymbol{g_{x}}=
\begin{cases}
\boldsymbol{g_b},&amp; \mathrm{if} \ \boldsymbol x \in \left(-1, 1\right)\\
0,&amp; \mathrm{otherwise}.
\end{cases}\end{split}\]</div>
<p>And during inference, the computation process is expressed as</p>
<div class="math notranslate nohighlight">
\[\boldsymbol o = \operatorname{sign}(\boldsymbol{a}) \circledast \operatorname{sign}(\boldsymbol{w}),\]</div>
<p>where <span class="math notranslate nohighlight">\(\circledast\)</span> indicates a convolutional operation using XNOR
and bitcount operations.</p>
<p>The related code in our codebase refers to
<a class="reference external" href="https://github.com/itayhubara/BinaryNet.pytorch">BinaryNet</a> and the
original paper.</p>
<p><strong>XNOR-Net</strong>. XNOR-Net obtains the channel-wise scaling factors
<span class="math notranslate nohighlight">\(\boldsymbol \alpha=\frac{\left\|\boldsymbol{w}\right\|}{\left|\boldsymbol{w}\right|}\)</span>
for the weight and <span class="math notranslate nohighlight">\(\boldsymbol{K}\)</span> contains scaling factors
<span class="math notranslate nohighlight">\(\beta\)</span> for all sub-tensors in activation <span class="math notranslate nohighlight">\(\boldsymbol{a}\)</span>.
We can approximate the convolution between activation
<span class="math notranslate nohighlight">\(\boldsymbol{a}\)</span> and weight <span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span> mainly using
binary operations:</p>
<div class="math notranslate nohighlight">
\[\boldsymbol o = (\operatorname{sign}(\boldsymbol{a}) \circledast \operatorname{sign}(\boldsymbol{w})) \odot \boldsymbol{K} \boldsymbol \alpha,\]</div>
<p>where <span class="math notranslate nohighlight">\(\boldsymbol{w} \in \mathbb{R}^{c \times w \times h}\)</span> and
<span class="math notranslate nohighlight">\(\boldsymbol{a} \in \mathbb{R}^{c \times w_{\text {in }} \times h_{\text {in }}}\)</span>
denote the weight and input tensor, respectively. And the STE is also
applied in the backward propagation of the training process.</p>
<p>The related code in our codebase refers to <a class="reference external" href="https://github.com/allenai/XNOR-Net">XNOR-Net
(1)</a>, <a class="reference external" href="https://github.com/ziplab/QTool/">XNOR-Net
(2)</a>, and the original paper.</p>
<p><strong>DoReFa-Net</strong>. DoReFa-Net applies the following function for
<span class="math notranslate nohighlight">\(1\)</span>-bit weight and activation:</p>
<div class="math notranslate nohighlight">
\[\boldsymbol o = (\operatorname{sign}(\boldsymbol{a}) \circledast \operatorname{sign}(\boldsymbol{w})) \odot \boldsymbol \alpha.\]</div>
<p>And the STE is also applied in the backward propagation with the
full-precision gradient.</p>
<p>The related code in our codebase refers to <a class="reference external" href="https://github.com/zzzxxxttt/pytorch_DoReFaNet">DoReFa-Net
(1)</a>, <a class="reference external" href="https://github.com/tensorpack/tensorpack/tree/master/examples/DoReFa-Net">DoReFa-Net
(2)</a>,
and the original paper.</p>
<p><strong>Bi-Real Net</strong>. Bi-Real Net proposes a piece-wise polynomial function
as the gradient approximation function:</p>
<div class="math notranslate nohighlight">
\[\begin{split} \operatorname{bireal}\left(\boldsymbol{a}\right)=\left\{\begin{array}{lr}
-1 &amp; \text { if } \boldsymbol{a}&lt;-1 \\
2 \boldsymbol{a}+\boldsymbol{a}^2 &amp; \text { if }-1 \leqslant \boldsymbol{a}&lt;0 \\
2 \boldsymbol{a}-\boldsymbol{a}^2 &amp; \text { if } 0 \leqslant \boldsymbol{a}&lt;1 \\
1 &amp; \text { otherwise }
\end{array}, \quad \frac{\partial \operatorname{bireal}\left(\boldsymbol{a}\right)}{\partial \boldsymbol{a}}= \begin{cases}2+2 \boldsymbol{a} &amp; \text { if }-1 \leqslant \boldsymbol{a}&lt;0 \\
2-2 \boldsymbol{a} &amp; \text { if } 0 \leqslant \boldsymbol{a}&lt;1 \\
0 &amp; \text { otherwise }\end{cases}\right. .\end{split}\]</div>
<p>And the forward propagation of Bi-Real Net is the same as DoReFa-Net.</p>
<p>The related code in our codebase refers to <a class="reference external" href="https://github.com/liuzechun/Bi-Real-net">Bi-Real
Net</a> and the original
paper.</p>
<p><strong>XNOR-Net++</strong>. XNOR-Net++ proposes to re-formulate XNOR-Net as:</p>
<div class="math notranslate nohighlight">
\[\boldsymbol{o} = (\operatorname{sign}(\boldsymbol{a}) \circledast \operatorname{sign}(\boldsymbol{w})) \odot \boldsymbol \Gamma,\]</div>
<p>and we adopt the <span class="math notranslate nohighlight">\(\boldsymbol \Gamma\)</span> as the following form in
experiments (achieve the best performance in the original paper):</p>
<div class="math notranslate nohighlight">
\[\boldsymbol \Gamma=\boldsymbol \alpha \otimes \boldsymbol \beta \otimes \boldsymbol \gamma, \quad \boldsymbol \alpha \in \mathbb{R}^{\boldsymbol{o}}, \boldsymbol \beta \in \mathbb{R}^{h_{\text {out }}}, \boldsymbol \gamma \in \mathbb{R}^{w_{\text {out }}},\]</div>
<p>where <span class="math notranslate nohighlight">\(\boldsymbol \alpha\)</span>, <span class="math notranslate nohighlight">\(\boldsymbol \beta\)</span>, and
<span class="math notranslate nohighlight">\(\boldsymbol \gamma\)</span> are learnable during training.</p>
<p>The related code in our codebase refers to
<a class="reference external" href="https://github.com/larq/zoo/blob/main/larq_zoo/literature/real_to_bin_nets.py">XNOR-Net++</a>
and the original paper.</p>
<p><strong>ReActNet</strong>. ReActNet defines an RSign as a binarization function with
channel-wise learnable thresholds:</p>
<div class="math notranslate nohighlight">
\[\begin{split}\boldsymbol{x}=\operatorname{rsign}\left(\boldsymbol{x}\right)=\left\{\begin{array}{ll}
+1, &amp; \text { if } \boldsymbol{x}&gt;\boldsymbol \alpha \\
-1, &amp; \text { if } \boldsymbol{x} \leq \boldsymbol \alpha
\end{array} .\right.\end{split}\]</div>
<p>where <span class="math notranslate nohighlight">\(\boldsymbol \alpha\)</span> is a learnable coefficient controlling
the threshold. And the forward propagation is</p>
<div class="math notranslate nohighlight">
\[\boldsymbol o = (\operatorname{rsign}(\boldsymbol{a}) \circledast \operatorname{sign}(\boldsymbol{w})) \odot \boldsymbol \alpha.\]</div>
<p>The related code in our codebase refers to
<a class="reference external" href="https://github.com/liuzechun/ReActNet">ReActNet</a> and the original
paper.</p>
<p><strong>ReCU</strong>. As described in their paper, ReCU is formulated as</p>
<div class="math notranslate nohighlight">
\[\operatorname{recu}(\boldsymbol{w})=\max \left(\min \left(\boldsymbol{w}, Q_{(\tau)}\right), Q_{(1-\tau)}\right),\]</div>
<p>where <span class="math notranslate nohighlight">\(Q_{(\tau)}\)</span> and <span class="math notranslate nohighlight">\(Q_{(1-\tau)}\)</span> denote the
<span class="math notranslate nohighlight">\(\tau\)</span> quantile and <span class="math notranslate nohighlight">\(1-\tau\)</span> quantile of
<span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span>, respectively. And other implementations also
strictly follow the original paper and official code.</p>
<p>The related code in our codebase refers to
<a class="reference external" href="https://github.com/z-hXu/ReCU">ReCU</a> and the original paper.</p>
<p><strong>FDA</strong>. FDA computes the gradients of <span class="math notranslate nohighlight">\(\boldsymbol{o}\)</span> in the
backward propagation as:</p>
<div class="math notranslate nohighlight">
\[\frac{\partial \ell}{\partial \mathbf{t}}=\frac{\partial \ell}{\partial \boldsymbol{o}} \boldsymbol{w}_2^{\top} \odot\left(\left(\mathbf{t} \boldsymbol{w}_1\right) \geq 0\right) \boldsymbol{w}_1^{\top}
+\frac{\partial \ell}{\partial \boldsymbol{o}} \eta^{\prime}(\mathbf{t})
+\frac{\partial \ell}{\partial \boldsymbol{o}} \odot \frac{4 \omega}{\pi} \sum_{i=0}^n \cos ((2 i+1) \omega \mathbf{t}),\]</div>
<p>where <span class="math notranslate nohighlight">\(\frac{\partial \ell}{\partial \boldsymbol{o}}\)</span> is the
gradient from the upper layers, <span class="math notranslate nohighlight">\(\odot\)</span> represents element-wise
multiplication, and <span class="math notranslate nohighlight">\(\frac{\partial \ell}{\partial \mathbf{t}}\)</span> is
the partial gradient on <span class="math notranslate nohighlight">\(\mathbf{t}\)</span> that backward propagates to
the former layer. And <span class="math notranslate nohighlight">\(\boldsymbol{w}_1\)</span> and
<span class="math notranslate nohighlight">\(\boldsymbol{w}_2\)</span> are weights in the original models and the
noise adaptation modules respectively. FDA updates them as</p>
<div class="math notranslate nohighlight">
\[\frac{\partial \ell}{\partial \boldsymbol{w}_1}=\mathbf{t}^{\top} \frac{\partial \ell}{\partial \boldsymbol{o}} \boldsymbol{w}_2^{\top} \odot\left(\left(\mathbf{t} \boldsymbol{w}_1\right) \geq 0\right),\qquad
\frac{\partial \ell}{\partial \boldsymbol{w}_2}=\sigma\left(\mathbf{t} \boldsymbol{w}_1\right)^{\top} \frac{\partial \ell}{\partial \boldsymbol{o}}.\]</div>
<p>The related code in our codebase refers to
<a class="reference external" href="https://gitee.com/mindspore/models/tree/master/research/cv/FDA-BNN">FDA</a>
and the original paper.</p>
</section>
<section id="learning-tasks">
<h1><span class="section-number">4. </span>Learning Tasks<a class="headerlink" href="#learning-tasks" title="Permalink to this heading">¶</a></h1>
<section id="d-visual-tasks">
<h2><span class="section-number">4.1. </span>2D Visual Tasks<a class="headerlink" href="#d-visual-tasks" title="Permalink to this heading">¶</a></h2>
<p>The <strong>classification tasks’ implementations</strong> of our codebase borrows
from related tasks in
<a class="reference external" href="https://github.com/open-mmlab/mmclassification">MMClassification</a>,
including CIFAR-10 and ImageNet classification tasks and models.</p>
<p><strong>CIFAR-10</strong>. The CIFAR-10 dataset (Canadian Institute For Advanced
Research) is a collection of images commonly used to train machine
learning and computer vision algorithms. This dataset is widely used for
image classification tasks. There are 60,000 color images, each of which
measures 32x32 pixels. All images are categorized into 10 different
classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships,
and trucks. Each class has 6000 images, where 5000 are for training and
1000 are for testing.</p>
<p><strong>ImageNet</strong>. ImageNet is a dataset of over 15 million labeled
high-resolution images belonging to roughly 22,000 categories.</p>
<p>The images are collected from the web and labeled by human labelers
using a crowd-sourced image labeling service called Amazon Mechanical
Turk. As part of the Pascal Visual Object Challenge, ImageNet
Large-Scale Visual Recognition Challenge (ILSVRC) was established in
2010. There are approximately 1.2 million training images, 50,000
validation images, and 150,000 testing images in total in ILSVRC. ILSVRC
uses a subset of ImageNet, with about 1000 images in each of the 1000
categories. ImageNet also uses accuracy to evaluate the predicted
results, which is defined above.</p>
<p>The <strong>object detection tasks’ implementations</strong> of our codebase borrows
from related tasks in
<a class="reference external" href="https://github.com/open-mmlab/mmdetection">MMDetection</a>, including
Pascal VPC07 and COCO17 detection tasks and models.</p>
<p><strong>Pascal VOC07</strong>. The PASCAL Visual Object Classes 2007 (Pascal VOC07)
dataset contains 20 object categories including vehicles, households,
animals, and other: airplane, bicycle, boat, bus, car, motorbike, train,
bottle, chair, dining table, potted plant, sofa, TV/monitor, bird, cat,
cow, dog, horse, sheep, and person. As a benchmark for object detection,
semantic segmentation, and object classification, this dataset contains
pixel-level segmentation annotations, bounding box annotations, and
object class annotations.</p>
<p><strong>COCO17</strong>. The MS COCO (Microsoft Common Objects in Context) dataset is
a large-scale object detection, segmentation, key-point detection, and
captioning dataset. The dataset consists of 328K images. According to
community feedback, in the 2017 release, the training/validation split
was changed from 83K/41K to 118K/5K. And the images and annotations are
the same. The 2017 test set is a subset of 41K images from the 2015 test
set. Additionally, 123K images are included in the unannotated dataset.</p>
</section>
<section id="d-visual-tasks-1">
<span id="id1"></span><h2><span class="section-number">4.2. </span>3D Visual Tasks<a class="headerlink" href="#d-visual-tasks-1" title="Permalink to this heading">¶</a></h2>
<p>The <strong>3D point cloud tasks’ implementations</strong> of our codebase borrows
from related tasks in
<a class="reference external" href="https://github.com/charlesq34/pointnet">PointNet</a> and
<a class="reference external" href="https://github.com/htqin/BiPointNet">BiPointNet</a>, including
ModelNet40 classification and ShapeNet segmentation tasks and models.</p>
<p><strong>ModelNet40</strong>. The ModelNet40 dataset contains point clouds of
synthetic objects. As the most widely used benchmark for point cloud
analysis, ModelNet40 is popular due to the diversity of categories,
clean shapes, and well-constructed dataset. In the original ModelNet40,
12,311 CAD-generated meshes are divided into 40 categories, where 9,843
are for training, and 2,468 are for testing. The point cloud data points
are sampled by a uniform sampling method from mesh surfaces and then
scaled into a unit sphere by moving to the origin.</p>
<p><strong>ShapeNet</strong>. ShapeNet is a large-scale repository for 3D CAD models
developed by researchers from Stanford University, Princeton University,
and the Toyota Technological Institute in Chicago, USA. Using WordNet
hypernym-hyponym relationships, the repository contains over 300M
models, with 220,000 classified into 3,135 classes. There are 31,693
meshes in the ShapeNet Parts subset, divided into 16 categories of
objects (<em>i.e.</em>, tables, chairs, planes, <em>etc</em>.). Each shape contains
2-5 parts (with 50 part classes in total).</p>
</section>
<section id="natural-language-understanding-tasks">
<h2><span class="section-number">4.3. </span>Natural Language Understanding Tasks<a class="headerlink" href="#natural-language-understanding-tasks" title="Permalink to this heading">¶</a></h2>
<p>The <strong>natural language understanding tasks’ implementations</strong> of our
codebase borrows from related tasks in
<a class="reference external" href="https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/DynaBERT">DynaBERT</a>
and <a class="reference external" href="https://github.com/htqin/BiBERT">BiBERT</a>, including the GLUE
benchmark tasks and models.</p>
<p><strong>GLUE</strong>. General Language Understanding Evaluation (GLUE) benchmark is
a collection of nine natural language understanding tasks, including
single-sentence tasks CoLA and SST-2, similarity and paraphrasing tasks
MRPC, STS-B and QQP, and natural language inference tasks MNLI, QNLI,
RTE, and WNLI.</p>
</section>
<section id="speech-tasks">
<h2><span class="section-number">4.4. </span>Speech Tasks<a class="headerlink" href="#speech-tasks" title="Permalink to this heading">¶</a></h2>
<p>The <strong>speech tasks’ implementations</strong> of our codebase borrows from
related tasks in
<a class="reference external" href="https://github.com/katsugeneration/tensor-fsmn">FSMN</a> and
<a class="reference external" href="https://github.com/htqin/BiFSMN">BiFSMN</a>, including the Google
Speech Commands classification tasks and models.</p>
<p><strong>Google Speech Commands</strong>. As part of its training and evaluation
process, Google Speech Commands Classification (SpeechCom) provides a
collection of audio recordings containing spoken words. Its primary goal
is to provide a way to build and test small models that detect a single
word that belongs to a set of ten target words. Models should detect as
few false positives as possible from background noise or unrelated
speech while providing as few false positives as possible.</p>
</section>
</section>
<section id="neural-architectures">
<h1><span class="section-number">5. </span>Neural Architectures<a class="headerlink" href="#neural-architectures" title="Permalink to this heading">¶</a></h1>
<section id="cnns">
<h2><span class="section-number">5.1. </span>CNNs<a class="headerlink" href="#cnns" title="Permalink to this heading">¶</a></h2>
<p>The <strong>CNNs’ implementations</strong> of our codebase borrows from
<a class="reference external" href="https://github.com/open-mmlab/mmclassification">MMClassification</a>
and <a class="reference external" href="https://github.com/open-mmlab/mmdetection">MMDetection</a>.</p>
<p><strong>ResNet</strong>. Residual Networks, or ResNets, learn residual functions
concerning the layer inputs instead of learning unreferenced functions.
Instead of making stacked layers directly fit a desired underlying
mapping, residual nets let these layers fit a residual mapping. There is
empirical evidence that these networks are easier to optimize and can
achieve higher accuracy with considerably increased depth.</p>
<p><strong>VGG</strong>. VGG is a classical convolutional neural network architecture.
It is proposed by an analysis of how to increase the depth of such
networks. It is characterized by its simplicity: the network utilizes
small 3$:raw-latex:<a href="#id2"><span class="problematic" id="id3">`</span></a>times`$3 filters, and the only other components are
pooling layers and a fully connected layer.</p>
<p><strong>MobileNetV2</strong>. MobileNetV2 is a convolutional neural network
architecture that performs well on mobile devices. This model has an
inverted residual structure with residual connections between the
bottleneck layers. The intermediate expansion layer employs lightweight
depthwise convolutions to filter features as a source of nonlinearity.
In MobileNetV2, the architecture begins with an initial layer of 32
convolution filters, followed by 19 residual bottleneck layers.</p>
<p><strong>Faster-RCNN</strong>. Faster R-CNN is an object detection model that improves
Fast R-CNN by utilizing a region proposal network (RPN) with the CNN
model. The RPN shares full-image convolutional features with the
detection network, enabling nearly cost-free region proposals. A fully
convolutional network is used to predict the bounds and objectness
scores of objects at each position simultaneously. RPNs use end-to-end
training to produce region proposals of high quality and instruct the
unified network where to search. Sharing their convolutional features
allows RPN and Fast R-CNN to be combined into a single network. Faster
R-CNN consists of two modules. The first module is a deep, fully
convolutional network that proposes regions, and the second is the
detector that uses the proposals for giving the final prediction boxes.</p>
<p><strong>SSD</strong>. SSD is a single-stage object detection method that discretizes
the output space of bounding boxes into a set of default boxes over
different aspect ratios and scales per feature map location. During
prediction, each default box is adjusted to match better the shape of
the object based on its scores for each object category. In addition,
the network automatically handles objects of different sizes by
combining predictions from multiple feature maps with different
resolutions.</p>
</section>
<section id="transformers">
<h2><span class="section-number">5.2. </span>Transformers<a class="headerlink" href="#transformers" title="Permalink to this heading">¶</a></h2>
<p>The <strong>transformers’ implementations</strong> of our codebase borrows from
<a class="reference external" href="https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/DynaBERT">DynaBERT</a>
and <a class="reference external" href="https://github.com/htqin/BiBERT">BiBERT</a>.</p>
<p><strong>BERT</strong>. BERT, or Bidirectional Encoder Representations from
Transformers, improves upon standard Transformers by removing the
unidirectionality constraint using a masked language model (MLM)
pre-training objective. By masking some tokens from the input, the
masked language model attempts to estimate the original vocabulary id of
the masked word based solely on its context. An MLM objective differs
from a left-to-right language model in that it enables the
representation to integrate the left and right contexts, which
facilitates pre-training a deep bidirectional Transformer. Additionally,
BERT uses a next-sentence prediction task that pre-trains text-pair
representations along with the masked language model. Note that we
replace the direct binarized attention with a bi-attention mechanism to
prevent the model from completely crashing.</p>
</section>
<section id="mlps">
<h2><span class="section-number">5.3. </span>MLPs<a class="headerlink" href="#mlps" title="Permalink to this heading">¶</a></h2>
<p>The <strong>MLPs’ implementations</strong> of our codebase borrows from
<a class="reference external" href="https://github.com/charlesq34/pointnet">PointNet</a>,
<a class="reference external" href="https://github.com/htqin/BiPointNet">BiPointNet</a>,
<a class="reference external" href="https://github.com/katsugeneration/tensor-fsmn">FSMN</a> and
<a class="reference external" href="https://github.com/htqin/BiFSMN">BiFSMN</a>.</p>
<p><strong>PointNet</strong>. PointNet is a unified architecture for applications
ranging from object classification and part segmentation to scene
semantic parsing. The architecture directly receives point clouds as
input and outputs either class labels for the entire input or point
segment/part labels. PointNet-Vanilla is a variant of PointNet, which
drops off the T-Net module. And for all PointNet models, we apply the
EMA-Max as the aggregator, because directly following the max pooling
aggregator will cause the binarized PointNets to fail to converge.</p>
<p><strong>FSMN</strong>. Feedforward sequential memory networks or FSMN is a novel
neural network structure to model long-term dependency in time series
without using recurrent feedback. It is a standard fully connected
feedforward neural network containing some learnable memory blocks. As a
short-term memory mechanism, the memory blocks encode long context
information using a tapped-delay line structure.</p>
<p><strong>Deep-FSMN</strong>. The Deep-FSMN architecture is an improved feedforward
sequential memory network (FSMN) with skip connections between memory
blocks in adjacent layers. By utilizing skip connections, information
can be transferred across layers, and thus the gradient vanishing
problem can be avoided when building very deep structures.</p>
</section>
</section>


          </div>
          
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
            <p class="logo"><a href="index.html">
              <img class="logo" src="_static/logo.png" alt="Logo"/>
            </a></p>
<h1 class="logo"><a href="index.html">BiBench</a></h1>








<h3>Navigation</h3>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">1. Introduction</a></li>
<li class="toctree-l1"><a class="reference internal" href="#installation">2. Installation</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#environment-preparation">2.1. Environment Preparation</a></li>
<li class="toctree-l2"><a class="reference internal" href="#data-preparation">2.2. Data Preparation</a></li>
<li class="toctree-l2"><a class="reference internal" href="#training">2.3. Training</a></li>
<li class="toctree-l2"><a class="reference internal" href="#add-custom-binarization-algorithms">2.4. Add Custom Binarization Algorithms</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="#binarization-algorithms">3. Binarization Algorithms</a></li>
<li class="toctree-l1"><a class="reference internal" href="#learning-tasks">4. Learning Tasks</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#d-visual-tasks">4.1. 2D Visual Tasks</a></li>
<li class="toctree-l2"><a class="reference internal" href="#d-visual-tasks-1">4.2. 3D Visual Tasks</a></li>
<li class="toctree-l2"><a class="reference internal" href="#natural-language-understanding-tasks">4.3. Natural Language Understanding Tasks</a></li>
<li class="toctree-l2"><a class="reference internal" href="#speech-tasks">4.4. Speech Tasks</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="#neural-architectures">5. Neural Architectures</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#cnns">5.1. CNNs</a></li>
<li class="toctree-l2"><a class="reference internal" href="#transformers">5.2. Transformers</a></li>
<li class="toctree-l2"><a class="reference internal" href="#mlps">5.3. MLPs</a></li>
</ul>
</li>
</ul>

<div class="relations">
<h3>Related Topics</h3>
<ul>
  <li><a href="index.html">Documentation overview</a><ul>
      <li>Previous: <a href="index.html" title="previous chapter">&lt;no title&gt;</a></li>
  </ul></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
  <h3 id="searchlabel">Quick search</h3>
    <div class="searchformwrapper">
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
      <input type="submit" value="Go" />
    </form>
    </div>
</div>
<script>document.getElementById('searchbox').style.display = "block"</script>








        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="footer">
      &copy;2022, BiBench.
      
      |
      Powered by <a href="http://sphinx-doc.org/">Sphinx 5.3.0</a>
      &amp; <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.12</a>
      
      |
      <a href="_sources/context.rst.txt"
          rel="nofollow">Page source</a>
    </div>

    

    
  </body>
</html>