This supplementary materials contain two part:

1. Supplementary document -- We append our appendix section to the end of our main submission for completeness.

2. Code snippet -- We provide the core script for our Focal Transformer. This code is for image classification, and can be transplanted into the pipelines for object detection and semantic segmentation.