{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ♻️ Auto Microbatching\n",
    "\n",
    "Have you ever wanted to choose your batch size **without** having to stress about CUDA Out-of-Memory (OOM) errors? We sure have. That's why we built Composer's automatic microbatching feature.\n",
    "\n",
    "This tutorial will demonstrate how to use automatic microbatching to avoid CUDA OOMs, regardless of your batch size choice, GPU type, and number of devices. \n",
    "\n",
    "**Note that this demo requires a GPU to demonstrate automatic microbatching.**\n",
    "\n",
    "### Recommended Background\n",
    "\n",
    "To follow this tutorial, you should be familiar with the basics of using the Composer trainer. Otherwise, it's pretty straightforward.\n",
    "\n",
    "### Tutorial Goals and Concepts Covered\n",
    "\n",
    "The goal of this tutorial is to show you how to turn on automatic gradient accumulation and to provide a sandbox to play around with it a bit. Please feel free to experiment with different batch sizes and other configuration choices to see how it works!\n",
    "\n",
    "For details of the implementation, see our [Automatic Microbatching][docs] documentation.\n",
    "\n",
    "Let's get started!\n",
    "\n",
    "[docs]: https://docs.mosaicml.com/projects/composer/en/stable/notes/auto_microbatching.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set Up Our Workspace"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll start by installing Composer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install mosaicml\n",
    "# To install from source instead of the last release, comment the command above and uncomment the following one.\n",
    "# %pip install git+https://github.com/mosaicml/composer.git"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are going to use the CIFAR-10 dataset with a ResNet-56 model and some standard optimization settings. For the purposes of this tutorial, we'll choose a very large batch size and increase the image size to 96x96. These settings will cause CUDA Out-of-Memory errors on most GPUs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "import composer\n",
    "from torchvision import datasets, transforms\n",
    "\n",
    "\n",
    "torch.manual_seed(42) # For replicability\n",
    "\n",
    "data_directory = \"./data\"\n",
    "\n",
    "# Normalization constants\n",
    "mean = (0.507, 0.487, 0.441)\n",
    "std = (0.267, 0.256, 0.276)\n",
    "\n",
    "# choose a very large batch size\n",
    "batch_size = 2048\n",
    "\n",
    "cifar10_transforms = transforms.Compose([\n",
    "  transforms.ToTensor(), \n",
    "  transforms.Normalize(mean, std),\n",
    "  transforms.Resize(size=[96, 96])  # choose a large image size\n",
    "])\n",
    "\n",
    "train_dataset = datasets.CIFAR10(data_directory, train=True, download=True, transform=cifar10_transforms)\n",
    "test_dataset = datasets.CIFAR10(data_directory, train=False, download=True, transform=cifar10_transforms)\n",
    "\n",
    "train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)\n",
    "test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "from composer.models import ComposerClassifier\n",
    "\n",
    "class Block(nn.Module):\n",
    "    \"\"\"A ResNet block.\"\"\"\n",
    "\n",
    "    def __init__(self, f_in: int, f_out: int, downsample: bool = False):\n",
    "        super(Block, self).__init__()\n",
    "\n",
    "        stride = 2 if downsample else 1\n",
    "        self.conv1 = nn.Conv2d(f_in, f_out, kernel_size=3, stride=stride, padding=1, bias=False)\n",
    "        self.bn1 = nn.BatchNorm2d(f_out)\n",
    "        self.conv2 = nn.Conv2d(f_out, f_out, kernel_size=3, stride=1, padding=1, bias=False)\n",
    "        self.bn2 = nn.BatchNorm2d(f_out)\n",
    "        self.relu = nn.ReLU(inplace=True)\n",
    "\n",
    "        # No parameters for shortcut connections.\n",
    "        if downsample or f_in != f_out:\n",
    "            self.shortcut = nn.Sequential(\n",
    "                nn.Conv2d(f_in, f_out, kernel_size=1, stride=2, bias=False),\n",
    "                nn.BatchNorm2d(f_out),\n",
    "            )\n",
    "        else:\n",
    "            self.shortcut = nn.Sequential()\n",
    "\n",
    "    def forward(self, x: torch.Tensor):\n",
    "        out = self.relu(self.bn1(self.conv1(x)))\n",
    "        out = self.bn2(self.conv2(out))\n",
    "        out += self.shortcut(x)\n",
    "        return self.relu(out)\n",
    "\n",
    "class ResNetCIFAR(nn.Module):\n",
    "    \"\"\"A residual neural network as originally designed for CIFAR-10.\"\"\"\n",
    "\n",
    "    def __init__(self, outputs: int = 10):\n",
    "        super(ResNetCIFAR, self).__init__()\n",
    "\n",
    "        depth = 56\n",
    "        width = 16\n",
    "        num_blocks = (depth - 2) // 6\n",
    "\n",
    "        plan = [(width, num_blocks), (2 * width, num_blocks), (4 * width, num_blocks)]\n",
    "\n",
    "        self.num_classes = outputs\n",
    "\n",
    "        # Initial convolution.\n",
    "        current_filters = plan[0][0]\n",
    "        self.conv = nn.Conv2d(3, current_filters, kernel_size=3, stride=1, padding=1, bias=False)\n",
    "        self.bn = nn.BatchNorm2d(current_filters)\n",
    "        self.relu = nn.ReLU(inplace=True)\n",
    "\n",
    "        # The subsequent blocks of the ResNet.\n",
    "        blocks = []\n",
    "        for segment_index, (filters, num_blocks) in enumerate(plan):\n",
    "            for block_index in range(num_blocks):\n",
    "                downsample = segment_index > 0 and block_index == 0\n",
    "                blocks.append(Block(current_filters, filters, downsample))\n",
    "                current_filters = filters\n",
    "\n",
    "        self.blocks = nn.Sequential(*blocks)\n",
    "\n",
    "        # Final fc layer. Size = number of filters in last segment.\n",
    "        self.fc = nn.Linear(plan[-1][0], outputs)\n",
    "        self.criterion = nn.CrossEntropyLoss()\n",
    "\n",
    "    def forward(self, x: torch.Tensor):\n",
    "        out = self.relu(self.bn(self.conv(x)))\n",
    "        out = self.blocks(out)\n",
    "        out = F.avg_pool2d(out, out.size()[3])\n",
    "        out = out.view(out.size(0), -1)\n",
    "        out = self.fc(out)\n",
    "        return out\n",
    "\n",
    "model = ComposerClassifier(module=ResNetCIFAR(), num_classes=10)\n",
    "\n",
    "optimizer = composer.optim.DecoupledSGDW(\n",
    "    model.parameters(), # Model parameters to update\n",
    "    lr=0.05,\n",
    "    momentum=0.9,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train a Baseline Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we run our trainer code with the `device_train_microbatch_size='auto'` setting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert torch.cuda.is_available(), \"Demonstrating automatic gradient accumulation requires a GPU.\"\n",
    "\n",
    "trainer = composer.trainer.Trainer(\n",
    "    model=model,\n",
    "    train_dataloader=train_dataloader,\n",
    "    eval_dataloader=test_dataloader,\n",
    "    optimizers=optimizer,\n",
    "    max_duration=\"1ep\",\n",
    "    device_train_microbatch_size='auto',  # <--- Activate Composer magic!\n",
    "    device='gpu'\n",
    ")\n",
    "\n",
    "# Train\n",
    "trainer.fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Depending on your GPU type, you should see some logs that increase the gradient accumulation dynamically until the model fits into memory, prior to the start of training—e.g., something like:\n",
    "\n",
    "```\n",
    "INFO:composer.trainer.trainer:CUDA out of memory detected.\n",
    "Train microbatch size decreased from 2048 -> 1024, and the batch\n",
    "will be retrained.\n",
    "```\n",
    "\n",
    "Worry not! This just means everything is working as expected. With automatic microbatching enabled, Composer responds to OOM errors during training by halving the microbatch size. Under the hood, each minibatch is split into `n` \"microbatches\", where `n` is the accumulation rate, and gradients are accumulated across microbatches before stepping the optimizer. So, you should expect to see the microbatch size decrease and accumulation rate increase until the resulting microbatch size fits on the device. This lets you focus on getting the best *minibatch* size without having to stress about what your hardware can handle."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## What next?\n",
    "\n",
    "You've now seen how to turn on automatic gradient accumulation using the Composer trainer.\n",
    "\n",
    "To dig deeper, see our [Automatic Microbatching][docs] documentation.\n",
    "\n",
    "In addition, please continue to explore our tutorials! Here's a couple suggestions:\n",
    "\n",
    "* Continue learning about other Composer features like [automatic restarting from checkpoints][autoresume]\n",
    "\n",
    "* Give your model life after training with Composer's [export for inference tools][exporting]\n",
    "\n",
    "* Explore more advanced applications of Composer like [applying image segmentation to medical images][image_segmentation_tutorial] or [fine-tuning a transformer for sentiment classification][huggingface_tutorial].\n",
    "\n",
    "\n",
    "\n",
    "[docs]: https://docs.mosaicml.com/projects/composer/en/stable/notes/auto_microbatching.html\n",
    "[autoresume]: https://docs.mosaicml.com/projects/composer/en/stable/examples/checkpoint_autoresume.html\n",
    "[exporting]: https://docs.mosaicml.com/projects/composer/en/stable/examples/exporting_for_inference.html\n",
    "[image_segmentation_tutorial]: https://docs.mosaicml.com/projects/composer/en/stable/examples/medical_image_segmentation.html\n",
    "[huggingface_tutorial]: https://docs.mosaicml.com/projects/composer/en/stable/examples/huggingface_models.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Come get involved with MosaicML!\n",
    "\n",
    "We'd love for you to get involved with the MosaicML community in any of these ways:\n",
    "\n",
    "### [Star Composer on GitHub](https://github.com/mosaicml/composer)\n",
    "\n",
    "Help make others aware of our work by [starring Composer on GitHub](https://github.com/mosaicml/composer).\n",
    "\n",
    "### [Join the MosaicML Slack](https://join.slack.com/t/mosaicml-community/shared_invite/zt-w0tiddn9-WGTlRpfjcO9J5jyrMub1dg)\n",
    "\n",
    "Head on over to the [MosaicML slack](https://join.slack.com/t/mosaicml-community/shared_invite/zt-w0tiddn9-WGTlRpfjcO9J5jyrMub1dg) to join other ML efficiency enthusiasts. Come for the paper discussions, stay for the memes!\n",
    "\n",
    "### Contribute to Composer\n",
    "\n",
    "Is there a bug you noticed or a feature you'd like? File an [issue](https://github.com/mosaicml/composer/issues) or make a [pull request](https://github.com/mosaicml/composer/pulls)!"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
