\section{Discussion and Limitations}\label{sec:discussion-limitations}

The STaPLe algorithm guides a largely autonomous self-improvement process, with the exception of a few hyperparameters that are to be set, discussed in Appendix \ref{appendix:hypers}. As a result, the algorithm does not require human supervision beyond the labels in the curated (publicly-available) mining corpus.

While the distribution of principles is mined relative to the mining corpus' task distribution, the STaPLe algorithm itself is task-agnostic, and can be used for any distribution of datasets where a reliable gold reference exists, or for paired preference data. However, designing a task-aware version of the STaPLe algorithm may reveal further insights into the model's task-dependent self-correction mechanisms while inducing a curriculum. As noted prior, we focus on the two-turn self-correction setting, but interesting insights could be extracted regarding the compositional nature of principles when extending to further refinement attempts, which also yields more diverse trajectories (combinatorially many possible) even over a condensed set of principles.

A core aim of alignment research is to balance human-readability with machine-readability. The STaPLe algorithm succeeds in achieving this by discovering principles that are useful to the model for self-correction, while compressing them to smaller set via clustering for a human reader to analyze. We believe that this work and the notion of LM self-improvement keeps with the theme of the Bitter Lesson \citep{sutton2019bitter}, when facilitated in a relatively autonomous fashion. 
Specifically, we aim to limit the influence of human-driven priors or constraints on the algorithm; this is reflected further by our clustering technique, and our ablation in Appendix \ref{appendix:bayesian-hypers} to fully automate this as well. 
At the same time, we acknowledge the value of human oversight on alignment process; as such, we believe that human-in-the-loop analysis of the principles as a post-processing mechanism following the E-step of each iteration is valuable to avoid misalignment or potentially harmful principles.