# Recommended EMO-STA Comparison Tables

## Main Text Version

### Continuous Optimization

#### Markdown

| Model | <div align="center">Function<br>minimization</div> | <div align="center">Circle<br>packing</div> | <div align="center">Circle packing<br>rectangles</div> | <div align="center">Heilbronn<br>triangle</div> |
| --- | --- | --- | --- | --- |
| **Format** | Adapt ± std / Single-task ± std | Adapt ± std / Single-task ± std | Adapt ± std / Single-task ± std | Adapt ± std / Single-task ± std |
| **Haiku-4.5** | **.949 ± .05** / .888 ± .05 | **.926 ± .03** / .865 ± .03 | **.865 ± .02** / .832 ± .01 | **.628 ± .05** / .547 ± .03 |
| **Sonnet-4.5** | **.917 ± .02** / .891 ± .05 | **.964 ± .02** / .927 ± .02 | **.890 ± .03** / .840 ± .02 | **.596 ± .05** / .548 ± .04 |
| **Opus-4.5** | **.942 ± .07** / .914 ± .05 | **.926 ± .01** / .912 ± .01 | **.944 ± .01** / .912 ± .01 | **.732 ± .04** / .622 ± .06 |
| **Sonnet-4.6** | **.973 ± .03** / .901 ± .02 | **.997 ± .00** / .969 ± .02 | **.995 ± .00** / .972 ± .01 | **.809 ± .04** / .679 ± .05 |
| **Opus-4.6** | **.973 ± .04** / .901 ± .04 | **.972 ± .02** / .963 ± .01 | **.957 ± .01** / .944 ± .01 | **.844 ± .03** / .744 ± .04 |

#### Suggested caption

Main-text EMO-STA comparison table for continuous optimization families. Each score cell reports **mean ± std** for **Adapt / Single-task** scores. The corresponding **Shared / Adapt / Total** iteration budgets are reported in the appendix split tables, where Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family. Within each score cell, **boldface compares Adapt against Single-task only** and marks whichever is larger. Where only one completed trial is available, the std is shown as **0.00**. A dash indicates that the corresponding model-task combination has not yet been run.

#### LaTeX

```latex
\begin{table*}[t]
\centering
\scriptsize
\setlength{\tabcolsep}{4pt}
\renewcommand{\arraystretch}{1.10}
\resizebox{\textwidth}{!}{%
\begin{tabular}{lcccc}
\hline
Model
& \shortstack{Function\\minimization}
& \shortstack{Circle\\packing}
& \shortstack{Circle packing\\rectangles}
& \shortstack{Heilbronn\\triangle} \\
\hline
\textbf{Format}
& \shortstack{Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Adapt $\pm$ std /\\ Single-task $\pm$ std} \\
\hline
\textbf{Haiku-4.5}
& $\mathbf{.949 \pm .05}$ / $.888 \pm .05$
& $\mathbf{.926 \pm .03}$ / $.865 \pm .03$
& $\mathbf{.865 \pm .02}$ / $.832 \pm .01$
& $\mathbf{.628 \pm .05}$ / $.547 \pm .03$ \\
\textbf{Sonnet-4.5}
& $\mathbf{.917 \pm .02}$ / $.891 \pm .05$
& $\mathbf{.964 \pm .02}$ / $.927 \pm .02$
& $\mathbf{.890 \pm .03}$ / $.840 \pm .02$
& $\mathbf{.596 \pm .05}$ / $.548 \pm .04$ \\
\textbf{Opus-4.5}
& $\mathbf{.942 \pm .07}$ / $.914 \pm .05$
& $\mathbf{.926 \pm .01}$ / $.912 \pm .01$
& $\mathbf{.944 \pm .01}$ / $.912 \pm .01$
& $\mathbf{.732 \pm .04}$ / $.622 \pm .06$ \\
\textbf{Sonnet-4.6}
& $\mathbf{.973 \pm .03}$ / $.901 \pm .02$
& $\mathbf{.997 \pm .00}$ / $.969 \pm .02$
& $\mathbf{.995 \pm .00}$ / $.972 \pm .01$
& $\mathbf{.809 \pm .04}$ / $.679 \pm .05$ \\
\textbf{Opus-4.6}
& $\mathbf{.973 \pm .04}$ / $.901 \pm .04$
& $\mathbf{.972 \pm .02}$ / $.963 \pm .01$
& $\mathbf{.957 \pm .01}$ / $.944 \pm .01$
& $\mathbf{.844 \pm .03}$ / $.744 \pm .04$ \\
\hline
\end{tabular}%
}
\caption{Main-text EMO-STA comparison table for continuous optimization families. Each score cell reports \textit{mean $\pm$ std} for \textit{Adapt / Single-task} scores. The corresponding \textit{Shared / Adapt / Total} iteration budgets are reported in the appendix split tables, where Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family. Within each score cell, boldface compares \textit{Adapt} against \textit{Single-task} only and marks whichever is larger. Where only one completed trial is available, the std is shown as \texttt{0.00}. A dash indicates that the corresponding model--task combination has not yet been run.}
\label{tab:emo-sta-main-constructive}
\end{table*}
```

### Modeling & Algorithmic Optimization

#### Markdown

| Model | <div align="center">Signal<br>processing</div> | <div align="center">SLDBench-3D</div> | <div align="center">Rust adaptive<br>sort</div> | <div align="center">K-module</div> |
| --- | --- | --- | --- | --- |
| **Format** | Adapt ± std / Single-task ± std | Adapt ± std / Single-task ± std | Adapt ± std / Single-task ± std | Adapt ± std / Single-task ± std |
| **Haiku-4.5** | **.584 ± .06** / .569 ± .01 | **.953 ± .02** / .951 ± .01 | .535 ± .02 / **.539 ± .02** | **.567 ± .04** / .550 ± .03 |
| **Sonnet-4.5** | **.578 ± .02** / .576 ± .01 | **.971 ± .01** / .959 ± .01 | .484 ± .03 / **.528 ± .01** | **.650 ± .02** / .617 ± .05 |
| **Opus-4.5** | **.635 ± .03** / .568 ± .01 | .972 ± .01 / **.973 ± .01** | **.520 ± .05** / .497 ± .02 | **.675 ± .03** / .567 ± .05 |
| **Sonnet-4.6** | **.626 ± .04** / .608 ± .03 | **.968 ± .01** / .955 ± .01 | **.663 ± .01** / .616 ± .03 | **.700 ± .07** / .675 ± .05 |
| **Opus-4.6** | **.707 ± .04** / .649 ± .03 | **.973 ± .01** / .965 ± .02 | **.625 ± .02** / .531 ± .05 | **.800 ± .05** / .758 ± .08 |

#### Suggested caption

Main-text EMO-STA comparison table for modeling and algorithmic optimization families. Each score cell reports **mean ± std** for **Adapt / Single-task** scores. The corresponding **Shared / Adapt / Total** iteration budgets are reported in the appendix split tables, where Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family. Within each score cell, **boldface compares Adapt against Single-task only** and marks whichever is larger. Where only one completed trial is available, the std is shown as **0.00**. A dash indicates that the corresponding model-task combination has not yet been run.

#### LaTeX

```latex
\begin{table*}[t]
\centering
\scriptsize
\setlength{\tabcolsep}{4pt}
\renewcommand{\arraystretch}{1.10}
\resizebox{\textwidth}{!}{%
\begin{tabular}{lcccc}
\hline
Model
& \shortstack{Signal\\processing}
& \shortstack{SLDBench-3D}
& \shortstack{Rust adaptive\\sort}
& \shortstack{K-module} \\
\hline
\textbf{Format}
& \shortstack{Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Adapt $\pm$ std /\\ Single-task $\pm$ std} \\
\hline
\textbf{Haiku-4.5}
& $\mathbf{.584 \pm .06}$ / $.569 \pm .01$
& $\mathbf{.953 \pm .02}$ / $.951 \pm .01$
& $.535 \pm .02$ / $\mathbf{.539 \pm .02}$
& $\mathbf{.567 \pm .04}$ / $.550 \pm .03$ \\
\textbf{Sonnet-4.5}
& $\mathbf{.578 \pm .02}$ / $.576 \pm .01$
& $\mathbf{.971 \pm .01}$ / $.959 \pm .01$
& $.484 \pm .03$ / $\mathbf{.528 \pm .01}$
& $\mathbf{.650 \pm .02}$ / $.617 \pm .05$ \\
\textbf{Opus-4.5}
& $\mathbf{.635 \pm .03}$ / $.568 \pm .01$
& $.972 \pm .01$ / $\mathbf{.973 \pm .01}$
& $\mathbf{.520 \pm .05}$ / $.497 \pm .02$
& $\mathbf{.675 \pm .03}$ / $.567 \pm .05$ \\
\textbf{Sonnet-4.6}
& $\mathbf{.626 \pm .04}$ / $.608 \pm .03$
& $\mathbf{.968 \pm .01}$ / $.955 \pm .01$
& $\mathbf{.663 \pm .01}$ / $.616 \pm .03$
& $\mathbf{.700 \pm .07}$ / $.675 \pm .05$ \\
\textbf{Opus-4.6}
& $\mathbf{.707 \pm .04}$ / $.649 \pm .03$
& $\mathbf{.973 \pm .01}$ / $.965 \pm .02$
& $\mathbf{.625 \pm .02}$ / $.531 \pm .05$
& $\mathbf{.800 \pm .05}$ / $.758 \pm .08$ \\
\hline
\end{tabular}%
}
\caption{Main-text EMO-STA comparison table for modeling and algorithmic optimization families. Each score cell reports \textit{mean $\pm$ std} for \textit{Adapt / Single-task} scores. The corresponding \textit{Shared / Adapt / Total} iteration budgets are reported in the appendix split tables, where Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family. Within each score cell, boldface compares \textit{Adapt} against \textit{Single-task} only and marks whichever is larger. Where only one completed trial is available, the std is shown as \texttt{0.00}. A dash indicates that the corresponding model--task combination has not yet been run.}
\label{tab:emo-sta-main-modeling}
\end{table*}
```

## Appendix Version

### Continuous Optimization

#### Markdown

| Model | <div align="center">Function<br>minimization</div> | <div align="center">Circle<br>packing</div> | <div align="center">Circle packing<br>rectangles</div> | <div align="center">Heilbronn<br>triangle</div> |
| --- | --- | --- | --- | --- |
| **Format** | Shared ± std / Adapt ± std / Single-task ± std | Shared ± std / Adapt ± std / Single-task ± std | Shared ± std / Adapt ± std / Single-task ± std | Shared ± std / Adapt ± std / Single-task ± std |
| **Budget (Shared / Adapt / Total)** | 40 / 15 / 100 | 60 / 15 / 120 | 60 / 15 / 120 | 60 / 15 / 120 |
| **Haiku-4.5** | .887 ± .06 / **.949 ± .05** / .888 ± .05 | .902 ± .05 / **.926 ± .03** / .865 ± .03 | .832 ± .01 / **.865 ± .02** / .832 ± .01 | .523 ± .03 / **.628 ± .05** / .547 ± .03 |
| **Sonnet-4.5** | .862 ± .03 / **.917 ± .02** / .891 ± .05 | .938 ± .03 / **.964 ± .02** / .927 ± .02 | .875 ± .04 / **.890 ± .03** / .840 ± .02 | .472 ± .08 / **.596 ± .05** / .548 ± .04 |
| **Opus-4.5** | .877 ± .07 / **.942 ± .07** / .914 ± .05 | .901 ± .01 / **.926 ± .01** / .912 ± .01 | .935 ± .01 / **.944 ± .01** / .912 ± .01 | .608 ± .05 / **.732 ± .04** / .622 ± .06 |
| **Sonnet-4.6** | .946 ± .03 / **.973 ± .03** / .901 ± .02 | .995 ± .00 / **.997 ± .00** / .969 ± .02 | .993 ± .00 / **.995 ± .00** / .972 ± .01 | .711 ± .05 / **.809 ± .04** / .679 ± .05 |
| **Opus-4.6** | .942 ± .04 / **.973 ± .04** / .901 ± .04 | .960 ± .02 / **.972 ± .02** / .963 ± .01 | .941 ± .01 / **.957 ± .01** / .944 ± .01 | .784 ± .03 / **.844 ± .03** / .744 ± .04 |

#### Suggested caption

Appendix EMO-STA comparison table for continuous optimization families. Each score cell reports **mean ± std** for **Shared / Adapt / Single-task** scores, and the budget row reports **Shared / Adapt / Total** iterations. Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family. Within each score cell, **boldface compares Adapt against Single-task only** and marks whichever is larger; the Shared score is shown for context. Where only one completed trial is available, the std is shown as **0.00**. A dash indicates that the corresponding model-task combination has not yet been run.

#### LaTeX

```latex
\begin{table*}[t]
\centering
\scriptsize
\setlength{\tabcolsep}{4pt}
\renewcommand{\arraystretch}{1.10}
\resizebox{\textwidth}{!}{%
\begin{tabular}{lcccc}
\hline
Model
& \shortstack{Function\\minimization}
& \shortstack{Circle\\packing}
& \shortstack{Circle packing\\rectangles}
& \shortstack{Heilbronn\\triangle} \\
\hline
\textbf{Format}
& \shortstack{Shared $\pm$ std / Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Shared $\pm$ std / Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Shared $\pm$ std / Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Shared $\pm$ std / Adapt $\pm$ std /\\ Single-task $\pm$ std} \\
\textbf{Budget (Shared / Adapt / Total)}
& 40 / 15 / 100
& 60 / 15 / 120
& 60 / 15 / 120
& 60 / 15 / 120 \\
\hline
\textbf{Haiku-4.5}
& $.887 \pm .06$ / $\mathbf{.949 \pm .05}$ / $.888 \pm .05$
& $.902 \pm .05$ / $\mathbf{.926 \pm .03}$ / $.865 \pm .03$
& $.832 \pm .01$ / $\mathbf{.865 \pm .02}$ / $.832 \pm .01$
& $.523 \pm .03$ / $\mathbf{.628 \pm .05}$ / $.547 \pm .03$ \\
\textbf{Sonnet-4.5}
& $.862 \pm .03$ / $\mathbf{.917 \pm .02}$ / $.891 \pm .05$
& $.938 \pm .03$ / $\mathbf{.964 \pm .02}$ / $.927 \pm .02$
& $.875 \pm .04$ / $\mathbf{.890 \pm .03}$ / $.840 \pm .02$
& $.472 \pm .08$ / $\mathbf{.596 \pm .05}$ / $.548 \pm .04$ \\
\textbf{Opus-4.5}
& $.877 \pm .07$ / $\mathbf{.942 \pm .07}$ / $.914 \pm .05$
& $.901 \pm .01$ / $\mathbf{.926 \pm .01}$ / $.912 \pm .01$
& $.935 \pm .01$ / $\mathbf{.944 \pm .01}$ / $.912 \pm .01$
& $.608 \pm .05$ / $\mathbf{.732 \pm .04}$ / $.622 \pm .06$ \\
\textbf{Sonnet-4.6}
& $.946 \pm .03$ / $\mathbf{.973 \pm .03}$ / $.901 \pm .02$
& $.995 \pm .00$ / $\mathbf{.997 \pm .00}$ / $.969 \pm .02$
& $.993 \pm .00$ / $\mathbf{.995 \pm .00}$ / $.972 \pm .01$
& $.711 \pm .05$ / $\mathbf{.809 \pm .04}$ / $.679 \pm .05$ \\
\textbf{Opus-4.6}
& $.942 \pm .04$ / $\mathbf{.973 \pm .04}$ / $.901 \pm .04$
& $.960 \pm .02$ / $\mathbf{.972 \pm .02}$ / $.963 \pm .01$
& $.941 \pm .01$ / $\mathbf{.957 \pm .01}$ / $.944 \pm .01$
& $.784 \pm .03$ / $\mathbf{.844 \pm .03}$ / $.744 \pm .04$ \\
\hline
\end{tabular}%
}
\caption{Appendix EMO-STA comparison table for continuous optimization families. Each score cell reports \textit{mean $\pm$ std} for \textit{Shared / Adapt / Single-task} scores, and the budget row reports \textit{Shared / Adapt / Total} iterations. Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family. Within each score cell, boldface compares \textit{Adapt} against \textit{Single-task} only and marks whichever is larger; the \textit{Shared} score is shown for context. Where only one completed trial is available, the std is shown as \texttt{0.00}. A dash indicates that the corresponding model--task combination has not yet been run.}
\label{tab:emo-sta-appendix-constructive}
\end{table*}
```

### Modeling & Algorithmic Optimization

#### Markdown

| Model | <div align="center">Signal<br>processing</div> | <div align="center">SLDBench-3D</div> | <div align="center">Rust adaptive<br>sort</div> | <div align="center">K-module</div> |
| --- | --- | --- | --- | --- |
| **Format** | Shared ± std / Adapt ± std / Single-task ± std | Shared ± std / Adapt ± std / Single-task ± std | Shared ± std / Adapt ± std / Single-task ± std | Shared ± std / Adapt ± std / Single-task ± std |
| **Budget (Shared / Adapt / Total)** | 60 / 10 / 100 | 60 / 10 / 80 | 60 / 10 / 100 | 40 / 20 / 120 |
| **Haiku-4.5** | .568 ± .04 / **.584 ± .06** / .569 ± .01 | .936 ± .02 / **.953 ± .02** / .951 ± .01 | .509 ± .03 / .535 ± .02 / **.539 ± .02** | .392 ± .05 / **.567 ± .04** / .550 ± .03 |
| **Sonnet-4.5** | .559 ± .02 / **.578 ± .02** / .576 ± .01 | .955 ± .02 / **.971 ± .01** / .959 ± .01 | .458 ± .03 / .484 ± .03 / **.528 ± .01** | .367 ± .02 / **.650 ± .02** / .617 ± .05 |
| **Opus-4.5** | .612 ± .03 / **.635 ± .03** / .568 ± .01 | .959 ± .02 / .972 ± .01 / **.973 ± .01** | .483 ± .05 / **.520 ± .05** / .497 ± .02 | .442 ± .02 / **.675 ± .03** / .567 ± .05 |
| **Sonnet-4.6** | .607 ± .05 / **.626 ± .04** / .608 ± .03 | .959 ± .01 / **.968 ± .01** / .955 ± .01 | .656 ± .01 / **.663 ± .01** / .616 ± .03 | .383 ± .05 / **.700 ± .07** / .675 ± .05 |
| **Opus-4.6** | .653 ± .04 / **.707 ± .04** / .649 ± .03 | .958 ± .02 / **.973 ± .01** / .965 ± .02 | .612 ± .02 / **.625 ± .02** / .531 ± .05 | .450 ± .03 / **.800 ± .05** / .758 ± .08 |

#### Suggested caption

Appendix EMO-STA comparison table for modeling and algorithmic optimization families. Each score cell reports **mean ± std** for **Shared / Adapt / Single-task** scores, and the budget row reports **Shared / Adapt / Total** iterations. Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family. Within each score cell, **boldface compares Adapt against Single-task only** and marks whichever is larger; the Shared score is shown for context. Where only one completed trial is available, the std is shown as **0.00**. A dash indicates that the corresponding model-task combination has not yet been run.

#### LaTeX

```latex
\begin{table*}[t]
\centering
\scriptsize
\setlength{\tabcolsep}{4pt}
\renewcommand{\arraystretch}{1.10}
\resizebox{\textwidth}{!}{%
\begin{tabular}{lcccc}
\hline
Model
& \shortstack{Signal\\processing}
& \shortstack{SLDBench-3D}
& \shortstack{Rust adaptive\\sort}
& \shortstack{K-module} \\
\hline
\textbf{Format}
& \shortstack{Shared $\pm$ std / Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Shared $\pm$ std / Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Shared $\pm$ std / Adapt $\pm$ std /\\ Single-task $\pm$ std}
& \shortstack{Shared $\pm$ std / Adapt $\pm$ std /\\ Single-task $\pm$ std} \\
\textbf{Budget (Shared / Adapt / Total)}
& 60 / 10 / 100
& 60 / 10 / 80
& 60 / 10 / 100
& 40 / 20 / 120 \\
\hline
\textbf{Haiku-4.5}
& $.568 \pm .04$ / $\mathbf{.584 \pm .06}$ / $.569 \pm .01$
& $.936 \pm .02$ / $\mathbf{.953 \pm .02}$ / $.951 \pm .01$
& $.509 \pm .03$ / $.535 \pm .02$ / $\mathbf{.539 \pm .02}$
& $.392 \pm .05$ / $\mathbf{.567 \pm .04}$ / $.550 \pm .03$ \\
\textbf{Sonnet-4.5}
& $.559 \pm .02$ / $\mathbf{.578 \pm .02}$ / $.576 \pm .01$
& $.955 \pm .02$ / $\mathbf{.971 \pm .01}$ / $.959 \pm .01$
& $.458 \pm .03$ / $.484 \pm .03$ / $\mathbf{.528 \pm .01}$
& $.367 \pm .02$ / $\mathbf{.650 \pm .02}$ / $.617 \pm .05$ \\
\textbf{Opus-4.5}
& $.612 \pm .03$ / $\mathbf{.635 \pm .03}$ / $.568 \pm .01$
& $.959 \pm .02$ / $.972 \pm .01$ / $\mathbf{.973 \pm .01}$
& $.483 \pm .05$ / $\mathbf{.520 \pm .05}$ / $.497 \pm .02$
& $.442 \pm .02$ / $\mathbf{.675 \pm .03}$ / $.567 \pm .05$ \\
\textbf{Sonnet-4.6}
& $.607 \pm .05$ / $\mathbf{.626 \pm .04}$ / $.608 \pm .03$
& $.959 \pm .01$ / $\mathbf{.968 \pm .01}$ / $.955 \pm .01$
& $.656 \pm .01$ / $\mathbf{.663 \pm .01}$ / $.616 \pm .03$
& $.383 \pm .05$ / $\mathbf{.700 \pm .07}$ / $.675 \pm .05$ \\
\textbf{Opus-4.6}
& $.653 \pm .04$ / $\mathbf{.707 \pm .04}$ / $.649 \pm .03$
& $.958 \pm .02$ / $\mathbf{.973 \pm .01}$ / $.965 \pm .02$
& $.612 \pm .02$ / $\mathbf{.625 \pm .02}$ / $.531 \pm .05$
& $.450 \pm .03$ / $\mathbf{.800 \pm .05}$ / $.758 \pm .08$ \\
\hline
\end{tabular}%
}
\caption{Appendix EMO-STA comparison table for modeling and algorithmic optimization families. Each score cell reports \textit{mean $\pm$ std} for \textit{Shared / Adapt / Single-task} scores, and the budget row reports \textit{Shared / Adapt / Total} iterations. Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family. Within each score cell, boldface compares \textit{Adapt} against \textit{Single-task} only and marks whichever is larger; the \textit{Shared} score is shown for context. Where only one completed trial is available, the std is shown as \texttt{0.00}. A dash indicates that the corresponding model--task combination has not yet been run.}
\label{tab:emo-sta-appendix-modeling}
\end{table*}
```

## Adaptation Methods Comparison Table

### Continuous Optimization

#### Markdown

| Model | <div align="center">Function<br>minimization</div> | <div align="center">Circle<br>packing</div> | <div align="center">Circle packing<br>rectangles</div> | <div align="center">Heilbronn<br>triangle</div> |
| --- | --- | --- | --- | --- |
| **Format** | STA Best-Local ± std / STA Warmstart ± std / STA Best-Shared ± std / Single-task ± std | STA Best-Local ± std / STA Warmstart ± std / STA Best-Shared ± std / Single-task ± std | STA Best-Local ± std / STA Warmstart ± std / STA Best-Shared ± std / Single-task ± std | STA Best-Local ± std / STA Warmstart ± std / STA Best-Shared ± std / Single-task ± std |
| **Haiku-4.5** | **.952 ± .04** / .949 ± .05 / .941 ± .06 / .888 ± .05 | .934 ± .03 / .926 ± .03 / **.940 ± .02** / .865 ± .03 | .861 ± .02 / **.865 ± .02** / .845 ± .01 / .832 ± .01 | **.650 ± .06** / .628 ± .05 / .628 ± .06 / .547 ± .03 |
| **Sonnet-4.5** | **.925 ± .02** / .917 ± .02 / .904 ± .03 / .891 ± .05 | **.965 ± .02** / .964 ± .02 / .947 ± .03 / .927 ± .02 | **.898 ± .03** / .890 ± .03 / .892 ± .04 / .840 ± .02 | **.622 ± .04** / .596 ± .05 / .619 ± .05 / .548 ± .04 |
| **Opus-4.5** | **.969 ± .03** / .942 ± .07 / .941 ± .09 / .914 ± .05 | **.940 ± .01** / .926 ± .01 / .930 ± .01 / .912 ± .01 | **.951 ± .01** / .943 ± .01 / .943 ± .01 / .912 ± .01 | **.741 ± .03** / .732 ± .04 / .704 ± .04 / .622 ± .06 |
| **Sonnet-4.6** | .988 ± .02 / .973 ± .03 / **.991 ± .02** / .901 ± .02 | **.997 ± .00** / .997 ± .00 / .997 ± .00 / .957 ± .03 | **.986 ± .01** / .985 ± .01 / .985 ± .01 / .967 ± .02 | .862 ± .04 / .809 ± .04 / **.865 ± .07** / .678 ± .05 |
| **Opus-4.6** | **.945 ± .03** / .943 ± .03 / .932 ± .04 / .895 ± .04 | **.984 ± .01** / .972 ± .02 / .979 ± .02 / .963 ± .01 | **.967 ± .01** / .957 ± .01 / .962 ± .01 / .944 ± .01 | .863 ± .03 / .844 ± .03 / **.877 ± .03** / .744 ± .04 |

#### Suggested caption

Comparison of standard single-task and EMO-STA optimization for continuous optimization families. Each score cell reports **mean ± std** for **STA Best-Local / STA Warmstart / STA Best-Shared / Single-task**. Bold marks the largest mean among the four scores in each cell.

#### LaTeX

```latex
\begin{table*}[t]
\centering
\caption{\small{Comparison of standard single-task and EMO-STA optimization for continuous optimization families. Each score cell reports \textit{mean $\pm$ std}; the first line is \textit{STA Best-Local / STA Warmstart}, and the second line is \textit{STA Best-Shared / Single-task}. Bold marks the largest mean among the four scores in each cell.}}
\label{tab:emo-sta-main-seed-adaptation-constructive}
\footnotesize
\setlength{\tabcolsep}{4pt}
\renewcommand{\arraystretch}{1.07}
\setlength{\aboverulesep}{0.5ex}
\setlength{\belowrulesep}{0.5ex}

\providecommand{\mtstscell}[1]{\begin{tabular}[c]{@{}c@{}}#1\end{tabular}}

\providecommand{\mtstsscorecell}[4]{%
\begin{tabular}[c]{@{}c@{}}
$#1$ / $#2$\\[-1pt]
$#3$ / $#4$
\end{tabular}}

\resizebox{\textwidth}{!}{%
\begin{tabular}{lcccc}
\toprule
\multirow{4}{*}{Model}
& \mtstscell{Function}
& \mtstscell{Circle}
& \mtstscell{Circle packing}
& \mtstscell{Heilbronn} \\
& \mtstscell{minimization}
& \mtstscell{packing}
& \mtstscell{rectangles}
& \mtstscell{triangle} \\[-1pt]
& {\scriptsize STA Best-Local / STA Warmstart}
& {\scriptsize STA Best-Local / STA Warmstart}
& {\scriptsize STA Best-Local / STA Warmstart}
& {\scriptsize STA Best-Local / STA Warmstart} \\[-2pt]
& {\scriptsize STA Best-Shared / Single-task}
& {\scriptsize STA Best-Shared / Single-task}
& {\scriptsize STA Best-Shared / Single-task}
& {\scriptsize STA Best-Shared / Single-task} \\
\midrule

\textbf{Haiku-4.5}
& \mtstsscorecell{\mathbf{.952 \pm .04}}{.949 \pm .05}{.941 \pm .06}{.888 \pm .05}
& \mtstsscorecell{.934 \pm .03}{.926 \pm .03}{\mathbf{.940 \pm .02}}{.865 \pm .03}
& \mtstsscorecell{.861 \pm .02}{\mathbf{.865 \pm .02}}{.845 \pm .01}{.832 \pm .01}
& \mtstsscorecell{\mathbf{.650 \pm .06}}{.628 \pm .05}{.628 \pm .06}{.547 \pm .03} \\
\midrule

\textbf{Sonnet-4.5}
& \mtstsscorecell{\mathbf{.925 \pm .02}}{.917 \pm .02}{.904 \pm .03}{.891 \pm .05}
& \mtstsscorecell{\mathbf{.965 \pm .02}}{.964 \pm .02}{.947 \pm .03}{.927 \pm .02}
& \mtstsscorecell{\mathbf{.898 \pm .03}}{.890 \pm .03}{.892 \pm .04}{.840 \pm .02}
& \mtstsscorecell{\mathbf{.622 \pm .04}}{.596 \pm .05}{.619 \pm .05}{.548 \pm .04} \\
\midrule

\textbf{Opus-4.5}
& \mtstsscorecell{\mathbf{.969 \pm .03}}{.942 \pm .07}{.941 \pm .09}{.914 \pm .05}
& \mtstsscorecell{\mathbf{.940 \pm .01}}{.926 \pm .01}{.930 \pm .01}{.912 \pm .01}
& \mtstsscorecell{\mathbf{.951 \pm .01}}{.943 \pm .01}{.943 \pm .01}{.912 \pm .01}
& \mtstsscorecell{\mathbf{.741 \pm .03}}{.732 \pm .04}{.704 \pm .04}{.622 \pm .06} \\
\midrule

\textbf{Sonnet-4.6}
& \mtstsscorecell{.988 \pm .02}{.973 \pm .03}{\mathbf{.991 \pm .02}}{.901 \pm .02}
& \mtstsscorecell{\mathbf{.997 \pm .00}}{\mathbf{.997 \pm .00}}{\mathbf{.997 \pm .00}}{.957 \pm .03}
& \mtstsscorecell{\mathbf{.986 \pm .01}}{.985 \pm .01}{.985 \pm .01}{.967 \pm .02}
& \mtstsscorecell{.862 \pm .04}{.809 \pm .04}{\mathbf{.865 \pm .07}}{.678 \pm .05} \\
\midrule

\textbf{Opus-4.6}
& \mtstsscorecell{\mathbf{.945 \pm .03}}{.943 \pm .03}{.932 \pm .04}{.895 \pm .04}
& \mtstsscorecell{\mathbf{.984 \pm .01}}{.972 \pm .02}{.979 \pm .02}{.963 \pm .01}
& \mtstsscorecell{\mathbf{.967 \pm .01}}{.957 \pm .01}{.962 \pm .01}{.944 \pm .01}
& \mtstsscorecell{.863 \pm .03}{.844 \pm .03}{\mathbf{.877 \pm .03}}{.744 \pm .04} \\

\bottomrule
\end{tabular}%
}
\end{table*}
```

### Modeling & Algorithmic Optimization

#### Markdown

| Model | <div align="center">Signal<br>processing</div> | <div align="center">SLDBench-3D</div> | <div align="center">Rust adaptive<br>sort</div> | <div align="center">K-module</div> |
| --- | --- | --- | --- | --- |
| **Format** | STA Best-Local ± std / STA Warmstart ± std / STA Best-Shared ± std / Single-task ± std | STA Best-Local ± std / STA Warmstart ± std / STA Best-Shared ± std / Single-task ± std | STA Best-Local ± std / STA Warmstart ± std / STA Best-Shared ± std / Single-task ± std | STA Best-Local ± std / STA Warmstart ± std / STA Best-Shared ± std / Single-task ± std |
| **Haiku-4.5** | **.600 ± .05** / .584 ± .06 / .597 ± .04 / .569 ± .01 | **.958 ± .02** / .953 ± .02 / .949 ± .02 / .951 ± .01 | .533 ± .02 / .535 ± .02 / .509 ± .03 / **.539 ± .02** | .567 ± .06 / .567 ± .04 / **.575 ± .07** / .550 ± .03 |
| **Sonnet-4.5** | **.587 ± .01** / .578 ± .02 / .582 ± .02 / .576 ± .01 | **.976 ± .01** / .971 ± .01 / .971 ± .02 / .959 ± .01 | .481 ± .03 / .484 ± .03 / .457 ± .03 / **.528 ± .01** | .617 ± .03 / **.650 ± .02** / .567 ± .06 / .617 ± .05 |
| **Opus-4.5** | .620 ± .03 / **.635 ± .03** / .625 ± .02 / .568 ± .01 | **.983 ± .00** / .972 ± .01 / .981 ± .00 / .973 ± .01 | .515 ± .05 / **.520 ± .05** / .483 ± .05 / .497 ± .02 | .617 ± .03 / **.675 ± .03** / .592 ± .03 / .567 ± .05 |
| **Sonnet-4.6** | **.628 ± .04** / .626 ± .04 / .613 ± .05 / .608 ± .03 | .969 ± .01 / .968 ± .01 / **.969 ± .01** / .955 ± .01 | .659 ± .01 / **.663 ± .01** / .656 ± .01 / .616 ± .03 | .617 ± .09 / **.700 ± .07** / .575 ± .03 / .675 ± .05 |
| **Opus-4.6** | .713 ± .05 / .707 ± .04 / **.716 ± .04** / .648 ± .03 | **.975 ± .01** / .973 ± .01 / .967 ± .01 / .964 ± .02 | .616 ± .02 / **.625 ± .02** / .612 ± .02 / .531 ± .05 | .725 ± .02 / **.800 ± .05** / .692 ± .05 / .758 ± .08 |

#### Suggested caption

Comparison of standard single-task and EMO-STA optimization for modeling and algorithmic optimization families. Each score cell reports **mean ± std** for **STA Best-Local / STA Warmstart / STA Best-Shared / Single-task**. Bold marks the largest mean among the four scores in each cell.

#### LaTeX

```latex
\begin{table*}[t]
\centering
\caption{\small{Comparison of standard single-task and EMO-STA optimization for modeling and algorithmic optimization families. Each score cell reports \textit{mean $\pm$ std}; the first line is \textit{STA Best-Local / STA Warmstart}, and the second line is \textit{STA Best-Shared / Single-task}. Bold marks the largest mean among the four scores in each cell.}}
\label{tab:emo-sta-main-seed-adaptation-modeling}
\footnotesize
\setlength{\tabcolsep}{4pt}
\renewcommand{\arraystretch}{1.07}
\setlength{\aboverulesep}{0.5ex}
\setlength{\belowrulesep}{0.5ex}

\providecommand{\mtstscell}[1]{\begin{tabular}[c]{@{}c@{}}#1\end{tabular}}

\providecommand{\mtstsscorecell}[4]{%
\begin{tabular}[c]{@{}c@{}}
$#1$ / $#2$\\[-1pt]
$#3$ / $#4$
\end{tabular}}

\resizebox{\textwidth}{!}{%
\begin{tabular}{lcccc}
\toprule
\multirow{4}{*}{Model}
& \mtstscell{Signal}
& \multirow{2}{*}{SLDBench-3D}
& \mtstscell{Rust adaptive}
& \multirow{2}{*}{K-module} \\
& \mtstscell{processing}
&
& \mtstscell{sort}
& \\[-1pt]
& {\scriptsize STA Best-Local / STA Warmstart}
& {\scriptsize STA Best-Local / STA Warmstart}
& {\scriptsize STA Best-Local / STA Warmstart}
& {\scriptsize STA Best-Local / STA Warmstart} \\[-2pt]
& {\scriptsize STA Best-Shared / Single-task}
& {\scriptsize STA Best-Shared / Single-task}
& {\scriptsize STA Best-Shared / Single-task}
& {\scriptsize STA Best-Shared / Single-task} \\
\midrule

\textbf{Haiku-4.5}
& \mtstsscorecell{\mathbf{.600 \pm .05}}{.584 \pm .06}{.597 \pm .04}{.569 \pm .01}
& \mtstsscorecell{\mathbf{.958 \pm .02}}{.953 \pm .02}{.949 \pm .02}{.951 \pm .01}
& \mtstsscorecell{.533 \pm .02}{.535 \pm .02}{.509 \pm .03}{\mathbf{.539 \pm .02}}
& \mtstsscorecell{.567 \pm .06}{.567 \pm .04}{\mathbf{.575 \pm .07}}{.550 \pm .03} \\
\midrule

\textbf{Sonnet-4.5}
& \mtstsscorecell{\mathbf{.587 \pm .01}}{.578 \pm .02}{.582 \pm .02}{.576 \pm .01}
& \mtstsscorecell{\mathbf{.976 \pm .01}}{.971 \pm .01}{.971 \pm .02}{.959 \pm .01}
& \mtstsscorecell{.481 \pm .03}{.484 \pm .03}{.457 \pm .03}{\mathbf{.528 \pm .01}}
& \mtstsscorecell{.617 \pm .03}{\mathbf{.650 \pm .02}}{.567 \pm .06}{.617 \pm .05} \\
\midrule

\textbf{Opus-4.5}
& \mtstsscorecell{.620 \pm .03}{\mathbf{.635 \pm .03}}{.625 \pm .02}{.568 \pm .01}
& \mtstsscorecell{\mathbf{.983 \pm .00}}{.972 \pm .01}{.981 \pm .00}{.973 \pm .01}
& \mtstsscorecell{.515 \pm .05}{\mathbf{.520 \pm .05}}{.483 \pm .05}{.497 \pm .02}
& \mtstsscorecell{.617 \pm .03}{\mathbf{.675 \pm .03}}{.592 \pm .03}{.567 \pm .05} \\
\midrule

\textbf{Sonnet-4.6}
& \mtstsscorecell{\mathbf{.628 \pm .04}}{.626 \pm .04}{.613 \pm .05}{.608 \pm .03}
& \mtstsscorecell{.969 \pm .01}{.968 \pm .01}{\mathbf{.969 \pm .01}}{.955 \pm .01}
& \mtstsscorecell{.659 \pm .01}{\mathbf{.663 \pm .01}}{.656 \pm .01}{.616 \pm .03}
& \mtstsscorecell{.617 \pm .09}{\mathbf{.700 \pm .07}}{.575 \pm .03}{.675 \pm .05} \\
\midrule

\textbf{Opus-4.6}
& \mtstsscorecell{.713 \pm .05}{.707 \pm .04}{\mathbf{.716 \pm .04}}{.648 \pm .03}
& \mtstsscorecell{\mathbf{.975 \pm .01}}{.973 \pm .01}{.967 \pm .01}{.964 \pm .02}
& \mtstsscorecell{.616 \pm .02}{\mathbf{.625 \pm .02}}{.612 \pm .02}{.531 \pm .05}
& \mtstsscorecell{.725 \pm .02}{\mathbf{.800 \pm .05}}{.692 \pm .05}{.758 \pm .08} \\

\bottomrule
\end{tabular}%
}
\end{table*}
```

## Appendix Adaptation Methods Comparison Tables

These appendix versions expand the compact main adaptation-methods tables by showing one row per method and by including the pre-adaptation shared score.

### Continuous Optimization

#### Suggested caption

Comparison of standard single-task and EMO-STA optimization for continuous optimization families. Each model is expanded into separate method rows, including the pre-adaptation shared score. The budget row reports **Shared / Adapt / Total** iterations, where Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family.

#### LaTeX

```latex
\begin{table*}[!h]
\centering
\caption{\small{Comparison of standard single-task and EMO-STA optimization for continuous optimization families. The budget row reports \textit{Shared / Adapt / Total} iterations, where Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family.}}
\label{tab:emo-sta-appendix-seed-adaptation-continuous}
\scriptsize
\setlength{\tabcolsep}{4pt}
\renewcommand{\arraystretch}{1.08}
\setlength{\aboverulesep}{0.5ex}
\setlength{\belowrulesep}{0.5ex}

\resizebox{\textwidth}{!}{%
\begin{tabular}{llcccc}
\toprule
Model & Method
& Function minimization
& Circle packing
& Circle packing rectangles
& Heilbronn triangle \\
\midrule
\multicolumn{2}{l}{Budget (Shared / Adapt / Total)}
& $40 / 15 / 100$
& $60 / 15 / 120$
& $60 / 15 / 120$
& $60 / 15 / 120$ \\
\midrule

\multirow{5}{*}{\textbf{Haiku-4.5}}
& STA Best-Shared (Before Adaptation) & $.887 \pm .06$ & $.902 \pm .05$ & $.832 \pm .01$ & $.523 \pm .03$ \\
& STA Best-Local & $\mathbf{.952 \pm .04}$ & $.934 \pm .03$ & $.861 \pm .02$ & $\mathbf{.650 \pm .06}$ \\
& STA Warmstart & $.949 \pm .05$ & $.926 \pm .03$ & $\mathbf{.865 \pm .02}$ & $.628 \pm .05$ \\
& STA Best-Shared & $.941 \pm .06$ & $\mathbf{.940 \pm .02}$ & $.845 \pm .01$ & $.628 \pm .06$ \\
& Single-task & $.888 \pm .05$ & $.865 \pm .03$ & $.832 \pm .01$ & $.547 \pm .03$ \\
\midrule

\multirow{5}{*}{\textbf{Sonnet-4.5}}
& STA Best-Shared (Before Adaptation) & $.862 \pm .03$ & $.938 \pm .03$ & $.875 \pm .04$ & $.472 \pm .08$ \\
& STA Best-Local & $\mathbf{.925 \pm .02}$ & $\mathbf{.965 \pm .02}$ & $\mathbf{.898 \pm .03}$ & $\mathbf{.622 \pm .04}$ \\
& STA Warmstart & $.917 \pm .02$ & $.964 \pm .02$ & $.890 \pm .03$ & $.596 \pm .05$ \\
& STA Best-Shared & $.904 \pm .03$ & $.947 \pm .03$ & $.892 \pm .04$ & $.619 \pm .05$ \\
& Single-task & $.891 \pm .05$ & $.927 \pm .02$ & $.840 \pm .02$ & $.548 \pm .04$ \\
\midrule

\multirow{5}{*}{\textbf{Opus-4.5}}
& STA Best-Shared (Before Adaptation) & $.877 \pm .07$ & $.901 \pm .01$ & $.935 \pm .01$ & $.608 \pm .05$ \\
& STA Best-Local & $\mathbf{.969 \pm .03}$ & $\mathbf{.940 \pm .01}$ & $\mathbf{.951 \pm .01}$ & $\mathbf{.741 \pm .03}$ \\
& STA Warmstart & $.942 \pm .07$ & $.926 \pm .01$ & $.943 \pm .01$ & $.732 \pm .04$ \\
& STA Best-Shared & $.941 \pm .09$ & $.930 \pm .01$ & $.943 \pm .01$ & $.704 \pm .04$ \\
& Single-task & $.914 \pm .05$ & $.912 \pm .01$ & $.912 \pm .01$ & $.622 \pm .06$ \\
\midrule

\multirow{5}{*}{\textbf{Sonnet-4.6}}
& STA Best-Shared (Before Adaptation) & $.946 \pm .03$ & $.995 \pm .00$ & $.993 \pm .00$ & $.711 \pm .05$ \\
& STA Best-Local & $.988 \pm .02$ & $\mathbf{.997 \pm .00}$ & $\mathbf{.986 \pm .01}$ & $.862 \pm .04$ \\
& STA Warmstart & $.973 \pm .03$ & $\mathbf{.997 \pm .00}$ & $.985 \pm .01$ & $.809 \pm .04$ \\
& STA Best-Shared & $\mathbf{.991 \pm .02}$ & $\mathbf{.997 \pm .00}$ & $.985 \pm .01$ & $\mathbf{.865 \pm .07}$ \\
& Single-task & $.901 \pm .02$ & $.957 \pm .03$ & $.967 \pm .02$ & $.678 \pm .05$ \\
\midrule

\multirow{5}{*}{\textbf{Opus-4.6}}
& STA Best-Shared (Before Adaptation) & $.942 \pm .04$ & $.960 \pm .02$ & $.941 \pm .01$ & $.784 \pm .03$ \\
& STA Best-Local & $\mathbf{.945 \pm .03}$ & $\mathbf{.984 \pm .01}$ & $\mathbf{.967 \pm .01}$ & $.863 \pm .03$ \\
& STA Warmstart & $.943 \pm .03$ & $.972 \pm .02$ & $.957 \pm .01$ & $.844 \pm .03$ \\
& STA Best-Shared & $.932 \pm .04$ & $.979 \pm .02$ & $.962 \pm .01$ & $\mathbf{.877 \pm .03}$ \\
& Single-task & $.895 \pm .04$ & $.963 \pm .01$ & $.944 \pm .01$ & $.744 \pm .04$ \\

\bottomrule
\end{tabular}%
}
\vspace{-5pt}
\end{table*}
```

### Modeling & Algorithmic Optimization

#### Suggested caption

Comparison of standard single-task and EMO-STA optimization for modeling and algorithmic optimization families. Each model is expanded into separate method rows, including the pre-adaptation shared score. The budget row reports **Shared / Adapt / Total** iterations, where Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family.

#### LaTeX

```latex
\begin{table*}[!h]
\centering
\caption{\small{Comparison of standard single-task and EMO-STA optimization for modeling and algorithmic optimization families. The budget row reports \textit{Shared / Adapt / Total} iterations, where Total is computed as Shared plus the per-task adaptation budget times the number of tasks in the family.}}
\label{tab:emo-sta-appendix-seed-adaptation-modeling}
\scriptsize
\setlength{\tabcolsep}{4pt}
\renewcommand{\arraystretch}{1.08}
\setlength{\aboverulesep}{0.5ex}
\setlength{\belowrulesep}{0.5ex}

\resizebox{\textwidth}{!}{%
\begin{tabular}{llcccc}
\toprule
Model & Method
& Signal processing
& SLDBench-3D
& Rust adaptive sort
& K-module \\
\midrule
\multicolumn{2}{l}{Budget (Shared / Adapt / Total)}
& $60 / 10 / 100$
& $60 / 10 / 80$
& $60 / 10 / 100$
& $40 / 20 / 120$ \\
\midrule

\multirow{5}{*}{\textbf{Haiku-4.5}}
& STA Best-Shared (Before Adaptation) & $.568 \pm .04$ & $.936 \pm .02$ & $.509 \pm .03$ & $.392 \pm .05$ \\
& STA Best-Local & $\mathbf{.600 \pm .05}$ & $\mathbf{.958 \pm .02}$ & $.533 \pm .02$ & $.567 \pm .06$ \\
& STA Warmstart & $.584 \pm .06$ & $.953 \pm .02$ & $.535 \pm .02$ & $.567 \pm .04$ \\
& STA Best-Shared & $.597 \pm .04$ & $.949 \pm .02$ & $.509 \pm .03$ & $\mathbf{.575 \pm .07}$ \\
& Single-task & $.569 \pm .01$ & $.951 \pm .01$ & $\mathbf{.539 \pm .02}$ & $.550 \pm .03$ \\
\midrule

\multirow{5}{*}{\textbf{Sonnet-4.5}}
& STA Best-Shared (Before Adaptation) & $.559 \pm .02$ & $.955 \pm .02$ & $.458 \pm .03$ & $.367 \pm .02$ \\
& STA Best-Local & $\mathbf{.587 \pm .01}$ & $\mathbf{.976 \pm .01}$ & $.481 \pm .03$ & $.617 \pm .03$ \\
& STA Warmstart & $.578 \pm .02$ & $.971 \pm .01$ & $.484 \pm .03$ & $\mathbf{.650 \pm .02}$ \\
& STA Best-Shared & $.582 \pm .02$ & $.971 \pm .02$ & $.457 \pm .03$ & $.567 \pm .06$ \\
& Single-task & $.576 \pm .01$ & $.959 \pm .01$ & $\mathbf{.528 \pm .01}$ & $.617 \pm .05$ \\
\midrule

\multirow{5}{*}{\textbf{Opus-4.5}}
& STA Best-Shared (Before Adaptation) & $.612 \pm .03$ & $.959 \pm .02$ & $.483 \pm .05$ & $.442 \pm .02$ \\
& STA Best-Local & $.620 \pm .03$ & $\mathbf{.983 \pm .00}$ & $.515 \pm .05$ & $.617 \pm .03$ \\
& STA Warmstart & $\mathbf{.635 \pm .03}$ & $.972 \pm .01$ & $\mathbf{.520 \pm .05}$ & $\mathbf{.675 \pm .03}$ \\
& STA Best-Shared & $.625 \pm .02$ & $.981 \pm .00$ & $.483 \pm .05$ & $.592 \pm .03$ \\
& Single-task & $.568 \pm .01$ & $.973 \pm .01$ & $.497 \pm .02$ & $.567 \pm .05$ \\
\midrule

\multirow{5}{*}{\textbf{Sonnet-4.6}}
& STA Best-Shared (Before Adaptation) & $.607 \pm .05$ & $.959 \pm .01$ & $.656 \pm .01$ & $.383 \pm .05$ \\
& STA Best-Local & $\mathbf{.628 \pm .04}$ & $\mathbf{.969 \pm .01}$ & $.659 \pm .01$ & $.617 \pm .09$ \\
& STA Warmstart & $.626 \pm .04$ & $.968 \pm .01$ & $\mathbf{.663 \pm .01}$ & $\mathbf{.700 \pm .07}$ \\
& STA Best-Shared & $.613 \pm .05$ & $\mathbf{.969 \pm .01}$ & $.656 \pm .01$ & $.575 \pm .03$ \\
& Single-task & $.608 \pm .03$ & $.955 \pm .01$ & $.616 \pm .03$ & $.675 \pm .05$ \\
\midrule

\multirow{5}{*}{\textbf{Opus-4.6}}
& STA Best-Shared (Before Adaptation) & $.653 \pm .04$ & $.958 \pm .02$ & $.612 \pm .02$ & $.450 \pm .03$ \\
& STA Best-Local & $.713 \pm .05$ & $\mathbf{.975 \pm .01}$ & $.616 \pm .02$ & $.725 \pm .02$ \\
& STA Warmstart & $.707 \pm .04$ & $.973 \pm .01$ & $\mathbf{.625 \pm .02}$ & $\mathbf{.800 \pm .05}$ \\
& STA Best-Shared & $\mathbf{.716 \pm .04}$ & $.967 \pm .01$ & $.612 \pm .02$ & $.692 \pm .05$ \\
& Single-task & $.648 \pm .03$ & $.964 \pm .02$ & $.531 \pm .05$ & $.758 \pm .08$ \\

\bottomrule
\end{tabular}%
}
\vspace{-5pt}
\end{table*}
```

## Circle Packing OOD Figure

![Circle packing OOD holdout evaluation](figures/circle_packing_s60_a15_b30_ood_holdout_eval.png)

### Suggested caption

Out-of-distribution holdout evaluation for circle packing at the selected EMO-STA budget **60 / 15 / 30**. The x-axis shows held-out circle counts (**N = 21, 23, 25**) plus the average across holdouts. **EMO-STA Adapt** and **Single-task** average holdout performance over the programs adapted to each in-distribution source task. Bars report mean scores across the five models used in the main comparison table.

### LaTeX

```latex
\begin{figure}[!t]
    \centering
    \includegraphics[width=0.78\linewidth]{figures/circle_packing_s60_a15_b30_ood_holdout_eval.pdf}
    \caption{Out-of-distribution holdout evaluation for circle packing at the selected EMO-STA budget $60 / 15 / 30$. The x-axis shows held-out circle counts ($N=21,23,25$) plus the average across holdouts. EMO-STA Adapt and Single-task average holdout performance over the programs adapted to each in-distribution source task. Bars report mean scores across the five models used in the main comparison table.}
    \label{fig:circle-packing-ood-holdouts}
\end{figure}
```

## How Each Family Was Adapted

Common EMO-STA pattern: each original benchmark was turned into a small family of related subtasks that share one evolving artifact, one evaluator family, and one shared-to-adapted workflow. The shared phase optimizes the average score across subtasks, then spawned task-specific runs warmstart from the shared archive and adapt to one selected task. This summary covers all EMO-STA benchmark families used in the paper except `r_robust_regression` and the original `k_module_problem` variant, which are omitted here for brevity.

- **Function minimization:** Adapted the original standalone single-objective `SinCosXY` example into four public 2D objectives: `SinCosXY`, `Ackley`, `Rastrigin`, and `Rosenbrock`. The evolving code now has to be a generic derivative-free optimizer that takes `objective_fn` and `bounds` from the evaluator rather than hardcoding one landscape. The benchmark functions were also translated so the exact optima are hidden from the candidate.
- **Signal processing:** Adapted the original single signal-processing benchmark, which evaluated one algorithm across five synthetic signal types, into four EMO-STA tasks: trend+sine, multifrequency, chirp, and step changes. Each task now has fixed length and noise settings and uses the same causal `process_signal(noisy_signal, window_size)` interface. The candidate only sees the noisy input signal, not the clean target, task ID, or generating formula, and the random-walk case was left out of the EMO-STA family.
- **Circle packing:** Adapted the original fixed `n=26` AlphaEvolve-style unit-square packing example into a family over nearby circle counts. The EMO-STA training tasks are `n in {20, 22, 24, 26}`, all using one generic `construct_packing(n)` interface, and there are evaluation-only holdouts at `n in {21, 23, 25}` to test transfer. A key detail is that the evaluator uses known per-task reference totals `target_sum_radii`, and the shared phase optimizes the average normalized target ratio `sum_radii / target_sum_radii` rather than raw summed radii. That normalization is necessary because absolute sums grow with `n`, so averaging raw sums would bias shared evolution toward larger-circle-count tasks instead of encouraging one reusable packing strategy across the family.
- **Circle packing in rectangles:** Added a second circle-packing EMO-STA family that keeps the same shared constructive structure but changes the container geometry. The evolving code still uses a generic `construct_packing(n)` interface, but it now chooses a rectangle width `alpha`, uses height `2 - alpha`, and packs circles into a perimeter-4 rectangle whose aspect ratio is itself part of the optimization problem. The public tasks use `n in {20, 21, 22, 23}`. As in the unit-square family, the evaluator uses known task-specific `target_sum_radii` values and shared evolution optimizes normalized target ratios rather than raw sums of radii. This matters even more here because attainable absolute sums vary with both `n` and geometry, so raw sums do not provide a coherent family-level objective; the known targets put all tasks on a common scale and let the shared phase learn geometry-aware behavior that transfers across nearby `n` values and changing aspect ratios.
- **Heilbronn triangle:** Adapted the Heilbronn triangle benchmark into a four-task EMO-STA family over nearby point counts inside one fixed canonical unit-area triangle. The evolving code uses a generic `construct_points(n)` / `run_heilbronn(n)` interface and must maximize the minimum triangle area induced by all triples of points. The public tasks correspond to `n in {9, 10, 11, 12}`, and the evaluator uses known task-specific `target_min_area` values so that shared evolution optimizes averaged normalized target ratios rather than raw minimum areas. That normalization matters because the attainable best minimum area changes substantially with `n`, so averaging raw areas would bias the shared phase toward easier smaller-`n` tasks instead of encouraging one reusable placement strategy across the family.
- **K-module balanced:** Adapted the original public 4-module, 5-option K-module problem into a harder hidden-family EMO-STA benchmark. The new family uses 6 modules with 6 opaque options each, four hidden target tasks, and a balanced design where each task matches the shared consensus on exactly half of the modules, so the shared optimum is useful but not identical to any one task. Task IDs and target configs are intentionally hidden from prompts and artifacts.
- **Symbolic regression:** Adapted the original generated-per-problem LLM-SRBench symbolic-regression workflow into one narrow EMO-STA family built from the `phys_osc` subset. Instead of generating a separate program, evaluator, and config for each problem, the EMO-STA version uses one shared evaluator and one generic interface over four oscillator equations (`PO11`, `PO17`, `PO30`, `PO37`) with fixed inputs `(x, t, v)` and target `dv_dt`. The evaluator only exposes numerical datasets through that interface and does not reveal the exact ground-truth equations.
- **SLDBench-3D:** Adapted the original 7-task SLDBench benchmark into a 2-task EMO-STA subset containing `vocab_scaling_law` and `data_constrained_scaling_law`. Both tasks are canonicalized to the same 3-column input schema `[model_size_like, diversity_like, total_data_like]`, so the evolving code learns one reusable law and fitter instead of separate task-specific raw schemas. Group-local coefficients are always refit locally during evaluation, so EMO-STA shares the law code and fitting procedure, not fitted coefficients.
- **Rust adaptive sort:** Adapted the original standalone single-evaluator Rust sorting example into four explicit deterministic regimes: random, nearly sorted, reverse sorted, and duplicates. The EMO-STA evaluator compiles each candidate once and then benchmarks it across task-selected datasets, letting one `adaptive_sort` implementation be shared and adapted across regimes. The `partially_sorted` regime from the original benchmark was intentionally excluded from the initial EMO-STA family and kept as a possible holdout/generalization check.

## Circle Packing Best-Local OOD Transfer Heatmap

![Circle packing Best-Local OOD transfer heatmap](figures/circle_packing_s60_a15_b30_best_local_ood_transfer_heatmap.png)

### Suggested caption

OOD circle-packing results for **STA Best-Local** at budget **60 / 15 / 120**. Rows are held-out sizes, columns are adaptation source tasks, and cells report mean OOD normalized score across LLMs and seeds.

### LaTeX

```latex
\begin{wrapfigure}{r}{0.39\linewidth}
    \centering
    \vspace{-4.7em}
    \includegraphics[width=\linewidth]{figures/circle_packing_s60_a15_b30_best_local_ood_transfer_heatmap.pdf}
    \caption{\small{OOD circle-packing results for \textit{STA Best-Local} at budget $60 / 15 / 120$. Rows are held-out sizes, columns are adaptation source tasks, and cells report mean OOD normalized score across LLMs and seeds.}}
    \label{fig:circle-packing-best-local-ood-transfer-heatmap}
    \vspace{-0.5em}
\end{wrapfigure}
```

## OOD Budget-Sweep Holdout Figures

![Circle packing OOD budget-sweep holdout evaluation](figures/circle_packing_ood_b30_by_holdout_seed_adaptation_methods.png)

### Circle Packing Caption

OOD holdout evaluation for circle packing across EMO-STA budget allocations with the single-task baseline fixed at 120 total iterations. The x-axis shows held-out task sizes plus the average across holdouts. The peach bars show the fixed single-task baseline, green colors denote the \textit{Shared / Per-task adaptation / Total} budget allocation, and hatch patterns denote the STA adaptation variant. Bars report mean OOD normalized score across LLMs.

### Circle Packing LaTeX

```latex
\begin{figure}[!t]
    \centering
    \includegraphics[width=0.82\linewidth]{figures/circle_packing_ood_b30_by_holdout_seed_adaptation_methods.pdf}
    \caption{OOD holdout evaluation for circle packing across EMO-STA budget allocations with the single-task baseline fixed at 120 total iterations. The x-axis shows held-out task sizes plus the average across holdouts. The peach bars show the fixed single-task baseline, green colors denote the \textit{Shared / Per-task adaptation / Total} budget allocation, and hatch patterns denote the STA adaptation variant. Bars report mean OOD normalized score across LLMs.}
    \label{fig:circle-packing-ood-b30-holdout-seed-adaptation}
\end{figure}
```

![Circle packing rectangle OOD budget-sweep holdout evaluation](figures/circle_packing_rectangle_ood_b30_by_holdout_seed_adaptation_methods.png)

### Circle Packing Rectangle Caption

OOD holdout evaluation for circle packing in rectangles across EMO-STA budget allocations with the single-task baseline fixed at 120 total iterations. The x-axis shows held-out task sizes plus the average across holdouts. The peach bars show the fixed single-task baseline, green colors denote the \textit{Shared / Per-task adaptation / Total} budget allocation, and hatch patterns denote the STA adaptation variant. Bars report mean OOD normalized score across LLMs.

### Circle Packing Rectangle LaTeX

```latex
\begin{figure}[!t]
    \centering
    \includegraphics[width=0.82\linewidth]{figures/circle_packing_rectangle_ood_b30_by_holdout_seed_adaptation_methods.pdf}
    \caption{OOD holdout evaluation for circle packing in rectangles across EMO-STA budget allocations with the single-task baseline fixed at 120 total iterations. The x-axis shows held-out task sizes plus the average across holdouts. The peach bars show the fixed single-task baseline, green colors denote the \textit{Shared / Per-task adaptation / Total} budget allocation, and hatch patterns denote the STA adaptation variant. Bars report mean OOD normalized score across LLMs.}
    \label{fig:circle-packing-rectangle-ood-b30-holdout-seed-adaptation}
\end{figure}
```

![Heilbronn triangle OOD budget-sweep holdout evaluation](figures/heilbronn_triangle_ood_b30_by_holdout_seed_adaptation_methods.png)

### Heilbronn Triangle Caption

OOD holdout evaluation for the Heilbronn triangle task across EMO-STA budget allocations with the single-task baseline fixed at 120 total iterations. The x-axis shows held-out task sizes plus the average across holdouts. The peach bars show the fixed single-task baseline, green colors denote the \textit{Shared / Per-task adaptation / Total} budget allocation, and hatch patterns denote the STA adaptation variant. Bars report mean OOD normalized score across LLMs.

### Heilbronn Triangle LaTeX

```latex
\begin{figure}[!t]
    \centering
    \includegraphics[width=0.82\linewidth]{figures/heilbronn_triangle_ood_b30_by_holdout_seed_adaptation_methods.pdf}
    \caption{OOD holdout evaluation for the Heilbronn triangle task across EMO-STA budget allocations with the single-task baseline fixed at 120 total iterations. The x-axis shows held-out task sizes plus the average across holdouts. The peach bars show the fixed single-task baseline, green colors denote the \textit{Shared / Per-task adaptation / Total} budget allocation, and hatch patterns denote the STA adaptation variant. Bars report mean OOD normalized score across LLMs.}
    \label{fig:heilbronn-triangle-ood-b30-holdout-seed-adaptation}
\end{figure}
```

## Heilbronn Triangle Public-Task Budget Sweep

![Heilbronn triangle public-task budget sweep](figures/heilbronn_budget_sweep_s60_adaptation_methods.png)

### Heilbronn Triangle Public-Task Budget Sweep Caption

Public-task budget sweep for the Heilbronn triangle family with the shared budget fixed at 60 iterations. The x-axis reports \textit{Shared / Per-task adaptation / Total} iterations. For each total budget, the single-task baseline uses the corresponding per-task baseline budget, so the comparison keeps the family-level iteration budget matched. Bars report mean normalized score across LLMs and seeds for \textit{STA Warmstart}, \textit{STA Best-Local}, \textit{STA Best-Shared}, and direct single-task optimization.

### Brief Explanation

This figure is the public-task counterpart to the OOD budget-sweep figure. It shows that increasing the total budget improves the direct single-task baseline, but the EMO-STA variants remain consistently stronger across the sweep. The gap is largest at lower budgets, where the shared phase provides useful geometric structure before task-specific adaptation; at higher budgets, single-task optimization improves, but it still does not close the gap to the adapted shared solutions.

### Heilbronn Triangle Public-Task Budget Sweep LaTeX

```latex
\begin{figure}[!t]
    \centering
    \includegraphics[width=0.82\linewidth]{figures/heilbronn_budget_sweep_s60_adaptation_methods.pdf}
    \caption{Public-task budget sweep for the Heilbronn triangle family with the shared budget fixed at 60 iterations. The x-axis reports \textit{Shared / Per-task adaptation / Total} iterations. For each total budget, the single-task baseline uses the corresponding per-task baseline budget, so the comparison keeps the family-level iteration budget matched. Bars report mean normalized score across LLMs and seeds for \textit{STA Warmstart}, \textit{STA Best-Local}, \textit{STA Best-Shared}, and direct single-task optimization.}
    \label{fig:heilbronn-public-task-budget-sweep-s60}
\end{figure}
```
