Petridish - Code Walkthrough

Background

Petridish is a NAS algorithm that grows networks starting from any network. Usually the starting network is very small and hand-specified although in practice any set of networks can be thrown in as seed networks. At each search iteration petridish evaluates a number of candidates and picks the most promising ones and adds them to the parent network. It then trains this modified network for a few more epochs before adding them back to the parent pool for further consideration for growth. Parents architectures are only selected for further growth if they lie close to the convex hull of the pareto-frontier (which serves as an upper bound of the error-vs-multiply-adds or error-vs-flops or error-vs-memory) curve. The intuition being only those models which are currently near the estimated pareto-frontier have realistic chance of lowering the curve by producing children models. Before we move ahead it will serve the reader well to familiarize themselves with the details via paper at NeuRIPS 2019, blog post or online lecture.

We will also assume that the reader has familiarized themselves with the core of Archai and followed through the getting started tutorial which will come in very handy!

Evaluation

The gallery of models found by Petridish is then trained for longer (usually 600 or 1500 epochs and with/without other enhancements like AutoAugment preprocessing or CutOut etc).

The code for model evaluation follows the usual pattern by overriding relevant parts of the Evaluater class and using ray for distributed parallel training of models on available gpus on the same machine.

Accuracy vs. multiply-additions after evaluation

Above we see the Accuracy vs. multiply-additions gallery. For example the model at 328M multiply-additions achieves 97.23% top-1 accuracy on CIFAR10 with 3M parameters and using 600 epochs.

Putting It All Together

Just as detailed in the blitz tutorial, we end up with our own PetridishModelBuilder and EvaluaterPetridish which we communicate to Archai via the PetridishExperimentRunner class and run the algorithm via main.py.

Note that Petridish is not constrained to searching pareto-frontiers of error-vs-multiply-additions only. One can easily change the x-axis to other quantities like flops, memory, number of parameters, intensity etc. By changing the search termination criteria and the models used to seed the search process, one can control the part of the x-axis that one wants to focus compute on.

We are looking forward to getting feedback, user stories and real-world scenarios that can be helped via Petridish.