The supplementary materials contain data for evaluation as described in Section 4.2.

* portraits.txt: 1000 sentences of a subject looking at the camera.
* scenes.txt: 1000 sentences of a subject performing an action in diverse scenes.
