# Running on Cloud TPUs

Tensor2Tensor supports running on Google Cloud Platforms TPUs, chips
specialized for ML training. See the official tutorials for [running the
T2T Transformer for text on Cloud TPUs](https://cloud.google.com/tpu/docs/tutorials/transformer) and
[Transformer for Speech Recognition](https://cloud.google.com/tpu/docs/tutorials/automated-speech-recognition).

## Other models on TPU

Many of Tensor2Tensor's models work on TPU.

You can provision a VM and TPU with `ctpu up`. Use the `t2t-trainer` command
on the VM as usual with the additional flags `--use_tpu` and
`--cloud_tpu_name=$TPU_NAME`.

Note that because the `TPUEstimator` does not catch the `OutOfRangeError`
during evaluation, you should ensure that `--eval_steps` is small enough to
not exhaust the evaluation data.

A non-exhaustive list of T2T models that work on TPU:

* Image generation: `imagetransformer` with `imagetransformer_base_tpu` (or
  `imagetransformer_tiny_tpu`)
* Super-resolution: `img2img_transformer` with `img2img_transformer_base_tpu`
  (or `img2img_transformer_tiny_tpu`)
* `resnet` with `resnet_50` (or `resnet_18` or `resnet_34`)
* `revnet` with `revnet_104` (or `revnet_38_cifar`)
* `shake_shake` with `shakeshake_tpu` (or `shakeshake_small`)

## Example invocation

Use `ctpu up` to bring up the VM and TPU machines; once the machines are ready
it will SSH you into the VM and you can run the following:

```
# DATA_DIR and OUT_DIR should be GCS buckets
# TPU_NAME should have been set automatically by the ctpu tool

t2t-trainer \
  --model=shake_shake \
  --hparams_set=shakeshake_tpu \
  --problem=image_cifar10 \
  --train_steps=180000 \
  --eval_steps=9 \
  --local_eval_frequency=100 \
  --data_dir=$DATA_DIR \
  --output_dir=$OUT_DIR \
  --use_tpu \
  --cloud_tpu_name=$TPU_NAME
```
