I have {nums} existing reward functions with their design ideas and codes as follows.
Additionally, we trained RL policy using the provided reward function code respectively and tracked the values of the individual components in the reward function as well as global policy metrics such as success rates and episode lengths after every {epoch_freq} epochs and the maximum, mean, minimum values encountered.

{reward_func_group}
Analysis tips for trained results:
{trained_result_analysis_tip}

Please create a new reward function that has a totally different form from the given algorithms. Try generating codes with different structures, flows or algorithms.
Here are some advices may helpful:
Select observation components that are more relevant to the task for reward calculation, adopting or replacing some components from the previous reward function;
Design a hierarchical reward system (if it is necessary) where the agent first completes one subtask and then proceeds to the next;
Use nested rewards, such as applying an exponential function the distance between two objects or using cosine similarity.

Remember the new reward function should have a higher task score.