This is the supplementary materials of submitted Paper: Value-as-Return: A Two-Stage Framework to Align on the Optimal Score Function.

Here we provide a copy of the appendix.

Please note that we provide evaluation scripts and pre-trained checkpoints at the following anonymous webpage:
https://osf.io/7a4fh/?view_only=f951fa68c7ef43a19a44ee38b11847f6
The page includes instructions for environment setup, exact command lines for evaluation, and expected outputs for sanity checks.