\subsection{Hyperparameter Selection} In the paper, and the preceding sections of the appendix, we describe the final hyperparameters used to run the experiments.
We arrived at these values by an informal hyperparameter search, and many hyperparameters never changed from their initial values.

\subsubsection{PDDLGym Domains}
For the PDDLGym domains, the simulation budget was set to 100K queries for compute reasons, as the PDDLGym simulator was slower than the domain-specific simulators.
The macro-learning budgets for the PDDLGym domains were set to be comparable to the number of simulator queries needed to solve a single problem instance using greedy best-first search with the goal-count heuristic and primitive actions.
The number of PDDLGym macros was chosen to be uniform across the various domains.

We ran some informal experiments with different amounts of macros to ensure that the approach was not overly sensitive to the number of macros, and found that there was no significant change in performance when adding more macros, as long as the effect size remained low.
We found that it was possible to tune the number of macros for each PDDLGym domain separately, with improved results, but felt that leaving the number of macros fixed was a more principled evaluation of our approach.

\subsubsection{15-Puzzle}
The simulation budget for 15-puzzle was set to 500K queries, although this full simulation budget was not needed since every problem was solved in fewer than that many generated states.
The number of macros for 15-puzzle was set higher than for PDDLGym, to compensate for the fact that the domain-specific simulator macros are tied to specific tiles, rather than lifted like the PDDLGym macros.
The numbers of random and focused macros were equal to each other, to ensure a fair comparison.

\subsubsection{Rubik's Cube}
We increased the Rubik's cube simulation budget to 2M queries to see whether the primitive-action planner could solve any problems with more planning time.
The macro-learning budget for Rubik's cube was set to 1M queries to see if the total cost of learning macros and planning was low enough to justify learning macros for a single problem instance.
The numbers of focused and random macros for Rubik's cube were chosen to match the number of expert macro-actions, which was itself chosen so that the expert macros could efficiently solve the Rubik's cube.

\subsection{Computational Resources}
This paper included experiments that ran a cluster of Linux machines running either RedHat 7.7 or Debian 10, with varying hardware specifications. However, a single seed for each of the experiments can run in 30 minutes (and usually significantly less) on a MacBook Pro running macOS Mojave (10.14.6), with 2GHz i5 processor and 16GB RAM. No GPUs were used for any of the experiments.

\subsection{Random Seeds} Random seeds were used to generate the problem instances, macro-actions, and planning results.
We have attempted to make results as reproducible as possible by fixing random seeds.
The commands listed in the \textsl{README} file should reproduce our results exactly.
As noted in the preceding sections of this appendix, we have also saved and included the generated problem instances in the linked code repository, to allow for maximum portability.
