\section{\dataset Dataset}

\subsection{Data Collection}
\label{data_collection}
We collect 350 sets of origami data. These data originate from various online resources, including origami tutorial websites\footnote{\url{https://origami-database.com/}}\textsuperscript{,}\footnote{\url{https://github.com/origamimagiro/flat-folder}}, forums\footnote{\url{https://mitani.cs.tsukuba.ac.jp/oripa/}}, and origami books\footnote{\url{https://www.giladorigami.com/origami-database.php}}\textsuperscript{,}\footnote{\url{https://oriwiki.com/}}. As depicted in Figure \ref{fig:case}, each complete data entry comprises the following four parts:

\textbf{CP Diagram} The CP diagram is a standardized format, representable by code, that displays all the creases of an origami model. It is typically a two-dimensional planar drawing where different line styles indicate different types of folds (e.g., mountain fold, valley fold). Subject to constraints, a CP diagram uniquely determines a folded shape image. The format of the CP diagrams in our dataset adheres to strict requirements, ensuring their correct parsing by our compiler.

\textbf{Compiled Flat Pattern} Through the compiler, the final folded state of the CP diagram under all constraints can be computed, and the output compiled flat pattern can represent the two-dimensional state of the origami model after complete folding.
% The CP diagram allows for the computation of the final folded state under all constraints. Through the compiler, it is possible to output the compiled flat pattern. This compiled flat pattern represents the two-dimensional state of the origami model after complete folding.

\textbf{Folded Shape Image} Different from the strictly compiled flat pattern, the folded shape image provides a direct, intuitive visualization of the final origami shape. It is typically a photograph or 3D rendering.

\textbf{Folding Process} The folding process refers to the multi-step sequence of transforming the original paper into the final shape. This folding process is gathered from various origami tutorials and cannot be represented in a standardized format, existing only as natural images.

We manually check and verify all data to ensure that 1) all CP diagrams can be compiled into the compiled flat pattern and correspond to the folded shape image; 2) the names of all origami data correspond to the folded shape image, with no potential for confusion (such as indistinguishable birds); and 3) all folding processes are feasible. In addition to this part of the data, we also collect 471 groups of data without intermediate folding processes for the subsequent training of the model.

\subsection{Compiler}
\label{cp}
The current origami compiler computes the final state achievable by a CP diagram under all mathematical constraints, thereby compiling the compiled flat pattern. We have optimized this process:
1) During compilation, we mark each crease, allowing us to locate the position of every crease in the compiled image.
2) We further compute the paper stacking order information, clarifying the top-bottom relationship of different paper regions in the compiled flat pattern.
3) We construct an interface for direct interaction between MLLMs and the compiler, enabling the model to call this system more conveniently to complete origami simulations.
4) We improve the error feedback system of the compiler. Specifically, it returns four types of errors:

\textbf{CP Code Syntax Error (CSE)} 
Validates the existence, format, and validity of inter-references of core data structures in the CP code (such as vertex coordinates \texttt{vertices\_coords}, edge-vertex relationships \texttt{edges\_vertices}, and face-vertex relationships \texttt{faces\_vertices}). It also checks if crease types (e.g., 'B', 'M', 'V', 'F', 'U') are predefined characters, and verifies if \textit{Euler's formula} for planar graphs is satisfied: $V - E + F = 2$, where V, E, and F represent the number of vertices, edges, and faces, respectively.

\textbf{Geometrically Impossible Fold (GIF)}
    Refers to cases where the CP code geometrically violates fundamental origami principles, making the fold physically unrealizable. For example, violating local flat-foldability conditions at a vertex (such as Maekawa's theorem $|M-V|=2$ or Kawasaki's theorem $\sum \alpha_i = 2\pi$), or specified crease angle combinations would require the paper to be stretched or torn.

\textbf{Paper Self-Intersection/Penetration (PSI)}
    Occurs when logically incompatible situations are found while deducing the relative positions and layering order of different paper sections after folding. This may manifest as a cycle in the calculated paper layering relationships (e.g., layer A is above layer B, layer B is above layer C, and layer C is, in turn, above layer A), or in a 2D unfolded representation, different paper regions are assigned to overlapping positions that would cause physical penetration.

\textbf{Ambiguous Folding State (AFS)} 
    This error occurs when a given CP code, due to its inherent under-constrained nature (e.g., allowing multiple valid mountain-valley assignments for creases, or lacking critical information such as crease types or angles), can be compliantly folded into multiple different stable geometric structures, or prevents the compiler from uniquely determining the layering order when processing complex overlapping paper regions.




% \textit{CP Code Syntax Error}, \textit{Geometrically Impossible Fold}, \textit{Paper Self-Intersection/Penetration}, and \textit{Ambiguous Folding State}, including detailed error parameters (see Appendix A).

\subsection{Dataset Statistics}
In \dataset, the distribution of different types of origami is relatively even. To ensure data diversity, we choose origami models covering different levels of complexity and types of folds, such as animals, plants, geometric shapes, etc. The average number of folding steps for origami models is 8.2, but the variation between different models varies greatly, ranging from a minimum of 3 steps to a maximum of 25 steps. Appendix \ref{app:data} presents more detailed data analysis, including the themes and names of all origami data and the proportion of different folding steps.