\section{Manual annotation}
\label{app:human}

\subsection{Annotation Rules for Pattern Prediction Task}
\label{app:human_1}

The primary goal of this annotation task is to create challenging yet fair incorrect options for multiple-choice questions (MCQs). For each given Crease Pattern (CP) diagram and its known correct folded 3D shape, annotators are required to design three distinct incorrect shape options. These options, along with the correct one, will form an MCQ designed to evaluate a model's ability to predict the 3D shape from the CP. The following rules must be strictly adhered to when designing these incorrect options:

\subsubsection{Rule 1: Ensure Visual Distinguishability}
Each incorrect option must be easily and clearly distinguishable visually from the correct folded shape. The purpose is to prevent ambiguity where an incorrect option might be confused with the correct one due to only subtle visual differences.

\textbf{Guideline:}
\begin{itemize}
    \item The overall silhouette, major components, and general form of the incorrect option should be significantly different from those of the correct option.
    \item Avoid creating incorrect options that are merely slight modifications, re-orientations, or minor proportional changes of the correct shape.
\end{itemize}

\textbf{Example:}
\begin{itemize}
    \item If the correct shape is an \textit{origami crane}:
    \begin{itemize}
        \item An incorrect option that is another bird in a very similar pose (e.g., a crane with wings slightly more elevated versus wings fully spread, if the overall form remains highly similar) might be \textbf{unsuitable} if it's not clearly visually distinct at a glance.
        \item A \textbf{suitable} incorrect option would be an \textit{origami box}, an \textit{origami boat}, or an \textit{origami star}, as these are visually very different from a crane.
    \end{itemize}
\end{itemize}

\subsubsection{Rule 2: Maintain Conceptual Distinctness}
Incorrect options should not be variations of the same concept or fall within the same narrow semantic category as the correct option. They should represent fundamentally different objects or ideas. This rule ensures the task tests the prediction of the specific shape, not fine-grained classification within a single conceptual group.

\textbf{Guideline:}
\begin{itemize}
    \item If the correct option is a specific type of animal, incorrect options should not be other animals that are closely related (e.g., from the same family) or share very similar overarching characteristics.
    \item Strive for incorrect options that belong to different conceptual categories than the correct option (e.g., animal vs. inanimate object vs. geometric form).
\end{itemize}

\textbf{Example:}
\begin{itemize}
    \item If the correct shape is an \textit{origami cat}:
    \begin{itemize}
        \item Incorrect options such as \textit{Lion}, \textit{Tiger}, or \textit{Leopard} are \textbf{unsuitable} because they are all felines and thus variations of the same core concept ("large cat" or "wild cat" as opposed to "domestic cat").
        \item \textbf{Suitable} incorrect options could be an \textit{origami airplane}, an \textit{origami hat}, or an \textit{origami fish} (assuming the 'fish' is a distinctly different concept from 'cat' within the context of common origami figures).
    \end{itemize}
\end{itemize}

\subsubsection{Rule 3: Ensure Crease Pattern Plausibility}
While incorrect, the alternative shapes should be plausible outcomes that could potentially be folded from a Crease Pattern that bears some relationship to the given CP diagram. This means an incorrect option might be a shape that could result from misinterpreting some creases, omitting a few key folds, or simplifying the original pattern. The objective is to create distractors that are not arbitrary but reflect potential, albeit erroneous, folding paths from a CP similar to the one provided.

\textbf{Guideline:}
\begin{itemize}
    \item Consider what alternative, simpler, or related shapes might emerge if certain folds in the CP are ignored, if mountain and valley folds are confused, or if a common base derived from the CP is completed into a different known figure.
    \item The incorrect option's implied CP should not be drastically more complex or entirely unrelated to the structural elements suggested by the given CP. It should ideally represent a shape that an intermediate folder might erroneously produce when attempting the correct model or a related one.
\end{itemize}

\textbf{Example:}
\begin{itemize}
    \item Given a CP diagram for a relatively simple \textit{origami boat}:
    \begin{itemize}
        \item A \textbf{suitable} incorrect option could be an \textit{origami hat} (e.g., a traditional paper hat like a "samurai helmet" or a simple party hat). Many simple hats share foundational folds or bases (like the water bomb base or a preliminary fold variation) with simple boats, or their CPs can be derived by altering or omitting a few creases from a boat's CP.
        \item An \textbf{unsuitable} incorrect option might be a highly complex \textit{origami insect} or a multi-piece \textit{modular origami ball} if the provided CP is for a simple, single-sheet boat. The CP for such complex figures would likely be vastly different and far more intricate, making them implausible alternatives based on the given simple CP.
    \end{itemize}
\end{itemize}

\textbf{Summary for Annotators Creating Incorrect Options:}
For each CP diagram and its corresponding correct folded shape, you are to design three unique incorrect shape options. Before finalizing these options, please verify each one against the following three criteria:
\begin{enumerate}
    \item \textbf{Visual Distinguishability:} Is the incorrect option clearly visually different from the correct shape?
    \item \textbf{Conceptual Distinctness:} Is the incorrect option conceptually different from the correct shape, avoiding mere variations of the same theme?
    \item \textbf{Crease Pattern Plausibility:} Is the incorrect option a shape that could plausibly (even if incorrectly) be derived from the provided CP or a closely related CP (e.g., through simplification or common error)?
\end{enumerate}
Adherence to these rules is crucial for creating high-quality and effective multiple-choice questions for the Pattern Prediction evaluation task.


% \subsection{Annotation Rules for Spatial Relationship Prediction Task}
% \label{app:human_2}
\subsection{Annotation Guidelines for Incorrect Option Generation in Spatial Relationship Prediction Task}
\label{app:human_2}

This section outlines the rules for annotators tasked with designing incorrect options for the Spatial Relationship Prediction task. For each Crease Pattern (CP) diagram, questions are posed about the spatial properties of the final folded origami model. While correct answers are generated by an optimized compiler, annotators must manually create three plausible yet incorrect options for each question to form a multiple-choice question (MCQ). The aim is to generate distractors that effectively test a model's nuanced understanding of 3D spatial relationships post-folding.

The task comprises three types of questions. Below are specific guidelines for designing incorrect options for each type:

\subsubsection{Type 1: Spatial Pose Localization}
This question type requires predicting the specific 3D position and/or pose (orientation) of a designated point (or feature) from the original flat paper once the model is fully folded. The pose might be described relative to a global reference frame (e.g., on a table, with a specific part facing upwards).

\textbf{Guidelines for Designing Incorrect Options:}
\begin{itemize}
    \item \textbf{Plausible Positional Errors:}
    \begin{itemize}
        \item Offer coordinates that are slightly offset from the correct 3D position (e.g., incorrect by a small delta in one or more axes, located in an adjacent quadrant, or on a wrong but nearby surface).
        \item Suggest a position that would be correct if a key fold were made inaccurately (e.g., a mountain fold treated as a valley, an incorrect fold angle, or slight misalignment of layers).
        \item Propose the final position of a different, perhaps nearby or symmetrically opposite, salient point from the original CP.
    \end{itemize}
    \item \textbf{Plausible Pose Errors (if orientation is part of the question):}
    \begin{itemize}
        \item Provide options with the correct 3D position but an incorrect orientation (e.g., correct $(x,y,z)$ coordinates, but the point/surface faces downwards instead of upwards, or is rotated $90^{\circ}$ incorrectly).
        \item Offer an orientation that is a common simplification (e.g., aligned perfectly with a major axis when it's actually slightly tilted).
    \end{itemize}
    \item \textbf{Symmetry-based Errors:} For CPs/models exhibiting symmetry, an incorrect option could be the symmetrical counterpart of the correct position or pose.
    \item \textbf{Reference Frame Confusion:} Offer a position or pose that is correct relative to a local part of the origami model but incorrect within the specified global reference frame, or vice-versa.
\end{itemize}

\textbf{Example:}
Suppose a specific vertex 'P' on the CP is queried for its final 3D coordinates $(x,y,z)$ and the direction its local paper surface is facing (e.g., 'upwards'), relative to a table it sits on. The correct answer (from compiler) is $(10, 5, 3)$, local surface facing 'upwards'.
\begin{itemize}
    \item \textbf{Suitable Incorrect Options could be:}
    \begin{itemize}
        \item $(10, 5, 0)$, local surface facing 'upwards' (Incorrect Z-coordinate, perhaps implying it's on the table surface when it's elevated).
        \item $(10, 5, 3)$, local surface facing 'downwards' (Correct position, but incorrect orientation).
        \item $(-10, 5, 3)$, local surface facing 'upwards' (A symmetrical position if the model has YZ plane symmetry and origin is centered).
        \item The final coordinates and pose of an adjacent vertex 'Q' from the CP.
    \end{itemize}
    \item \textbf{Unsuitable Incorrect Options:} Random coordinates or orientations with no plausible relation to the model's scale, structure, or folding process.
\end{itemize}

\subsubsection{Type 2: Layering Relationship Analysis}
This question type focuses on the internal structure of the folded model, specifically the stacking order of paper layers or the number of layers at a particular region (e.g., identifying the thickest region or counting layers at a specific point).

\textbf{Guidelines for Designing Incorrect Options:}
\begin{itemize}
    \item \textbf{For Number of Layers Questions:}
    \begin{itemize}
        \item Offer layer counts that are slightly off from the correct number (e.g., correct count $\pm 1$ or $\pm 2$ layers).
        \item Propose the layer count of an adjacent or visually similar region in the folded model.
        \item Suggest a count that might result from overlooking some hidden internal layers or, conversely, double-counting some visible folded edges as separate layers.
        \item If the question asks to identify the "thickest region" from a set of options, incorrect options should be other regions that are also thick, but not maximally so, or regions that appear thick but are not.
    \end{itemize}
    \item \textbf{For Stacking Order Questions:}
    \begin{itemize}
        \item Provide plausible but incorrect permutations of the layer sequence. For example, if the correct top-to-bottom order of layers (referenced by their original CP surface labels like S1, S2, S3) is S1-S3-S2, an incorrect option could be S1-S2-S3 or S2-S1-S3.
        \item Suggest an order that would result if a specific flap were tucked differently during folding (e.g., a flap going over another flap instead of under it).
        \item Offer an incomplete order (e.g., missing one or more layers from the sequence in that region) or an order that incorrectly includes layers not present in that specific stack.
    \end{itemize}
\end{itemize}

\textbf{Example:}
Question: "How many layers of paper form the central part of the crane's body?" Correct answer (from compiler): 8 layers.
\begin{itemize}
    \item \textbf{Suitable Incorrect Options could be:}
    \begin{itemize}
        \item 6 layers (Plausible underestimation, perhaps missing some internal folds).
        \item 7 layers (Close, but incorrect).
        \item 10 layers (Plausible overestimation, perhaps counting edges).
        \item 4 layers (Number of layers in the crane's wing, a different region).
    \end{itemize}
\end{itemize}
Question: "Consider a point X on the wing of a folded paper airplane. Starting from the externally visible top surface at X, what is the order of the original paper surfaces (labeled S1, S2, S3, S4 on the CP) one would pass through if drilling perpendicularly downwards through all layers at X?" Correct answer (from compiler): S1, S4, S2.
\begin{itemize}
    \item \textbf{Suitable Incorrect Options could be:}
    \begin{itemize}
        \item S1, S2, S4 (A common misremembered or simplified stacking).
        \item S4, S1, S2 (Incorrect starting layer or internal order).
        \item S1, S4 (Incomplete, missing the bottom layer S2).
    \end{itemize}
\end{itemize}

\subsubsection{Type 3: Geometric Change Analysis}
This question type involves predicting how specific geometric features (e.g., angles between lines, distances between points, areas of surfaces) change from their state in the flat CP diagram to their state in the final 3D folded model.

\textbf{Guidelines for Designing Incorrect Options:}
\begin{itemize}
    \item \textbf{Value from Original CP:} A very common and effective incorrect option is to offer the original geometric value as it was on the flat CP diagram (e.g., if an angle is $90^{\circ}$ on the CP but becomes $45^{\circ}$ in 3D, then $90^{\circ}$ is a strong distractor). This tests whether the model understands that geometric properties transform during folding.
    \item \textbf{Plausible Estimations or Miscalculations:}
    \begin{itemize}
        \item For angles: Provide common angles (e.g., $30^{\circ}, 45^{\circ}, 60^{\circ}, 90^{\circ}, 180^{\circ}$) that might appear correct upon a cursory visual inspection of the folded form, or angles that result from assuming a simplified 3D configuration (e.g., assuming perpendicularity or parallelism where it doesn't exactly exist).
        \item For distances: Offer distances measured along the paper surface instead of the true Euclidean distance through 3D space (or vice-versa, depending on the question's phrasing). Suggest distances that might result from slight errors in visualizing the 3D form, such as ignoring foreshortening or using dimensions from a 2D projection.
        \item For areas: Propose areas that don't account for overlaps of paper in the folded state, or the area of a 2D projection rather than the true 3D surface area (if the latter is specified). An area that results from a miscalculation of how a shape transforms (e.g., halving an area when it should be less or more).
    \end{itemize}
    \item \textbf{Qualitative Change Errors:} If the question is about the nature of change (e.g., "Does distance X increase, decrease, or stay the same?"), incorrect options could be the opposite type of change, or "stays the same" when there is indeed a significant change.
    \item \textbf{Values from Unrelated or Different Parts:} Offer a geometric value (angle, distance, area) that is correct for a different feature or part of the folded model, or for a different but related origami model.
\end{itemize}

\textbf{Example:}
Question: "Two line segments L1 and L2 are parallel on the CP diagram and are 5 cm apart. In the final folded model, these segments become two adjacent edges of a wing. What is the approximate angle between the segments L1 and L2 in the folded state?" Correct answer (from compiler): $60^{\circ}$.
\begin{itemize}
    \item \textbf{Suitable Incorrect Options could be:}
    \begin{itemize}
        \item $0^{\circ}$ (Implying they remain parallel, i.e., no change from CP state regarding their relative orientation).
        \item $90^{\circ}$ (A common angle in man-made objects and some origami steps, could be a plausible guess).
        \item $45^{\circ}$ (Another common angle, plausible visual estimate).
    \end{itemize}
\end{itemize}
Question: "A defined square region on the CP has an area of $16 \text{ cm}^2$. After folding, this region forms part of a curved surface. What is the approximate surface area of this region in the 3D model?" Correct answer (from compiler): $16 \text{ cm}^2$ (assuming no stretching/shrinking of paper, the intrinsic surface area remains the same, though its projected area might change).
\begin{itemize}
    \item \textbf{Suitable Incorrect Options could be:}
    \begin{itemize}
        \item $8 \text{ cm}^2$ (Perhaps confusing with a projected area that is halved).
        \item $12 \text{ cm}^2$ (A value less than original, implying shrinkage or significant overlap not intrinsic to the region itself).
        \item $20 \text{ cm}^2$ (A value more than original, implausible without stretching).
        * (Note: If the question was about *projected area*, then $16 \text{ cm}^2$ could be an incorrect option if the projection foreshortens it).
    \end{itemize}
\end{itemize}

\textbf{General Summary for Annotators Designing Incorrect Options:}
For each question across these three types, remember the following overarching principles when designing your three incorrect options:
\begin{enumerate}
    \item \textbf{Understand the Query:} First, be absolutely clear about what the question is asking regarding the folded CP and what the compiler-generated correct answer is.
    \item \textbf{Plausibility is Key:} Incorrect options should appear as reasonable possibilities to someone who might have a slight misunderstanding of the folding process, 3D geometry, or spatial reasoning. Avoid options that are trivially wrong, absurd, or completely random.
    \item \textbf{Ensure Clear Incorrectness:} While plausible, each incorrect option must be demonstrably wrong upon careful analysis based on the correct folding sequence and 3D geometry.
    \item \textbf{Introduce Variety in Errors:} The set of three incorrect options should ideally probe different potential misunderstandings or types of errors (e.g., one based on CP value, one on slight miscalculation, one on conceptual error).
    \item \textbf{Maintain Consistency:} Ensure that the format of your incorrect options (e.g., units, precision of numbers, terminology) is consistent with the format of the correct answer.
\end{enumerate}
By following these guidelines, you will help create high-quality multiple-choice questions that rigorously and fairly evaluate a model's capabilities in spatial relationship prediction for origami.


\subsection{Human evaluation}
\label{app:human_3}
For the manual evaluation of the first three tasks, we recruited evaluators from two different categories. The first category included five non-professionals recruited through a crowdsourcing platform; the second category comprised three experts with extensive experience in the field of origami. Participants in these evaluations were compensated according to the prevailing local minimum hourly wage standard.