# README

We submit Utopia benchmarking data in `utopia` folder, and the generation results from LMs in `lm_output` (only including GPT-3 Ada, Babbage, Davinci's results on Motion scene, because of size limits).

The `utopia` includes the benchmarking version of Utopia dataset (100 samples for each of 39 tasks). Besides vanilla and Mind's Eye evaluation data in zero- and few-shot settings, we also prepare ready-to-use evaluation data for other competitor methods (e.g., Chain-of-Though, Zero-shot Reasoners) as well as the data we use for ablation study (missing the final shot, described in Section 4.3, "Can few-shot replace simulation?"). The files are in tsv format, and each file only has two columns: question, and ground-truth answer. Note that we deliberately use '\\n' to replace '\n' in the data to avoid unexpected encoding errors, which means you have to reverse this procedure when you read the data from tsv files (changing '\\n' back to '\n').

The `lm_output` folder has three sub-folders stores results from GPT-3 Ada, Babbage, and Davinci. Each tsv file has recorded the question, ground-truth answer, model's raw answer, the parsed answer by our programs, and other useful information (please read the column names for better understanding).

Enjoy your new journey at Utopia, with the help of Mind's Eye!
