\section{Conclusion}
In this paper, we introduce \dataset, a novel benchmark specifically designed to address the underexplored areas of multi-step spatial reasoning and constraint adherence in Multimodal Large Language Models (MLLMs). Leveraging the inherent complexities of origami, \dataset provides 350 meticulously curated data instances and an enhanced compilation program to facilitate in-depth evaluation. The benchmark features four challenging tasks, including pattern prediction, spatial relationship prediction, multi-step spatial reasoning, and end-to-end code generation, making it the first to assess MLLMs' multi-step spatial reasoning under rigorous mathematical constraints. Our comprehensive evaluation of existing MLLMs and exploration of reinforcement learning methods for code generation highlight the utility of \dataset in not only assessing current capabilities but also in paving new ways to enhance the spatial intelligence of MLLMs.