I have one reward function with its design idea and code as follows.
Design Idea: {design_idea}
Code: {reward_function}
We trained a RL policy using the provided reward function code and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every {epoch_freq} epochs and the maximum, mean, minimum values encountered:
{trained_results}
Analysis tips for trained results:
{trained_result_analysis_tip}