wandb: Currently logged in as: 804703098. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.16.5
wandb: Run data is saved locally in /home/user/zhangyang/PycharmProjects/Nips2024-ITPC-v2/Nips2024-ITPC-v2/onpolicy/scripts/results/MPE/simple_tag_tr/rmappotrsyn/exp_train_continue_tag_base_klcp_s2r2_v1/wandb/run-20240402_151554-cwhpsmm1
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run MPE_333
wandb: ⭐️ View project at https://wandb.ai/804703098/Continue_Tag_Base_v1
wandb: 🚀 View run at https://wandb.ai/804703098/Continue_Tag_Base_v1/runs/cwhpsmm1/workspace
choose to use gpu...
idv policy and team policy use same initial params!

 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 0/10 episodes, total num timesteps 200/2000, FPS 244.

team_policy eval average step individual rewards of agent0: 0.004827808317681708
team_policy eval average team episode rewards of agent0: 0.0
team_policy eval idv catch total num of agent0: 3
team_policy eval team catch total num: 0
team_policy eval average step individual rewards of agent1: 0.23641626072443223
team_policy eval average team episode rewards of agent1: 0.0
team_policy eval idv catch total num of agent1: 12
team_policy eval team catch total num: 0
team_policy eval average step individual rewards of agent2: 0.09049421485419297
team_policy eval average team episode rewards of agent2: 0.0
team_policy eval idv catch total num of agent2: 7
team_policy eval team catch total num: 0
team_policy eval average step individual rewards of agent3: -0.005399786200470165
team_policy eval average team episode rewards of agent3: 0.0
team_policy eval idv catch total num of agent3: 2
team_policy eval team catch total num: 0
team_policy eval average step individual rewards of agent4: -0.016873727648448574
team_policy eval average team episode rewards of agent4: 0.0
team_policy eval idv catch total num of agent4: 3
team_policy eval team catch total num: 0
idv_policy eval average step individual rewards of agent0: -0.009033322035541476
idv_policy eval average team episode rewards of agent0: 2.5
idv_policy eval idv catch total num of agent0: 3
idv_policy eval team catch total num: 1
idv_policy eval average step individual rewards of agent1: -0.0793001854553991
idv_policy eval average team episode rewards of agent1: 2.5
idv_policy eval idv catch total num of agent1: 1
idv_policy eval team catch total num: 1
idv_policy eval average step individual rewards of agent2: -0.10264097758776913
idv_policy eval average team episode rewards of agent2: 2.5
idv_policy eval idv catch total num of agent2: 0
idv_policy eval team catch total num: 1
idv_policy eval average step individual rewards of agent3: 0.008530957362729743
idv_policy eval average team episode rewards of agent3: 2.5
idv_policy eval idv catch total num of agent3: 4
idv_policy eval team catch total num: 1
idv_policy eval average step individual rewards of agent4: -0.029192614077585513
idv_policy eval average team episode rewards of agent4: 2.5
idv_policy eval idv catch total num of agent4: 2
idv_policy eval team catch total num: 1

 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 1/10 episodes, total num timesteps 400/2000, FPS 238.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 2/10 episodes, total num timesteps 600/2000, FPS 259.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 3/10 episodes, total num timesteps 800/2000, FPS 267.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 4/10 episodes, total num timesteps 1000/2000, FPS 271.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 5/10 episodes, total num timesteps 1200/2000, FPS 273.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 6/10 episodes, total num timesteps 1400/2000, FPS 277.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 7/10 episodes, total num timesteps 1600/2000, FPS 278.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 8/10 episodes, total num timesteps 1800/2000, FPS 283.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 9/10 episodes, total num timesteps 2000/2000, FPS 286.

wandb: - 0.006 MB of 0.006 MB uploadedwandb: \ 0.006 MB of 0.650 MB uploadedwandb: | 0.650 MB of 0.650 MB uploadedwandb: 
wandb: Run history:
wandb:                                       Aa_idv_actor_loss ▇▇▇█▁▅▆▃▂▆
wandb:                                          Ab_policy_loss ▄▅▄▅▁▄▅▆▃█
wandb:                                     Ac_idv_ppo_loss_abs ▁▆▆▇▆▇▇▇▂█
wandb:                                         Ad_idv_ppo_prop ▁▇▇█▆█▇▇▁█
wandb:                                                  Ae_eta ▂▆▇█▄▄▄▆▁▄
wandb:                                    Af_noclip_proportion ▅██▇▁▅▄▅▅▂
wandb:                                    Ag_update_proportion ▆▆▄▁▃▄█▃▂▅
wandb:                                          Ah_update_loss ▁▇▇█▇█▇█▂█
wandb:                                         Ai_idv_epsilon' █▇▆▆▅▄▃▃▂▁
wandb:                                            Aj_idv_sigma ▁▂▂▃▂▃▃▂▆█
wandb:              Ak_idv_clip(sigma, 1-epislon', 1+epislon') ▁▃▃▄▃▄▄▃▆█
wandb:                                Al_idv_noclip_proportion ███▇▇▇▇█▃▁
wandb:                       Am_idv_(sigma*A)update_proportion ▄▄▆█▆▅▁▆▇▄
wandb:                             An_idv_(sigma*A)update_loss █▂▂▂▂▁▁▂█▁
wandb:                                     Ao_idv_entropy_prop █▂▂▁▃▁▂▂█▁
wandb:                                         Ap_dist_entropy ▁▂▂▁▅▃▃█▅▇
wandb:                                          Aq_idv_kl_prop ▁▁▁▁▁▁▁▁▁▁
wandb:                                          Ar_idv_kl_coef ▁▁▁▁▁▁▁▁▁▁
wandb:                                          As_idv_kl_loss ▁▃▃▃▃▅▅▄█▆
wandb:                                    At_idv_cross_entropy ▁▁▁▁▁▁▁▁▁▁
wandb:                                           Au_value_loss ▅█▁▁▁▁▁▁▁▁
wandb:                                           Av_advantages ▄▄█▇█▆██▆▁
wandb:                                       Aw_idv_actor_norm ▃▂▂▅▅▂▆▂▁█
wandb:                                      Ax_idv_critic_norm █▂▁▁▁▁▁▁▁▁
wandb:                                     Ba_idv_org_min_prop ▇█▅▅▅▃▆▅▁▆
wandb:                                     Bb_idv_org_max_prop ▄▃▄▁▃▇█▃▆▄
wandb:                                     Bc_idv_org_org_prop ▁▁▁▁▁▁▁▁▁▁
wandb:                                     Bd_idv_new_min_prop ▁▄▆█▃▅▃▄▆▄
wandb:                                     Be_idv_new_max_prop █▃▂▁▆▃▂▅▄▃
wandb:                                      Ta_team_actor_loss ▆█▇█▅▆▆▁▅▃
wandb:                                     Tb_team_policy_loss ▁▅▅▅▇▆▄▆▇█
wandb:                                    Tc_team_ppo_loss_abs ▇▅▇▇▄█▄█▁█
wandb:                                        Td_team_ppo_prop █▆▇█▄█▄▅▁▆
wandb:                                        Te_team_epsilon^ ▁▁▁▁▁▁▁▁▁▁
wandb:                                          Tf_team_sigma^ ▄▂▇▁▂▂▄▇█▅
wandb:          Tg_team_clip(sigma^, 1-epislon^', 1+epislon^') █▅▆▂▁▄▆█▇▇
wandb:                               Th_team_noclip_proportion █▇▆▆▆▅▄▅▁▁
wandb:                     Ti_team_(sigma^*A)update_proportion █▇▆▆▇▆▄▅▁▁
wandb:                           Tj_team_(sigma^*A)update_loss █▆▅▆▆▅▇▄▃▁
wandb:                                    Tk_team_entropy_prop ▁▃▂▁▅▁▅▄█▃
wandb:                                    Tl_team_dist_entropy ▁▂▂▂▅▃▃█▅▇
wandb:                                         Tm_team_kl_prop ▁▁▁▁▁▁▁▁▁▁
wandb:                                         Tn_team_kl_coef ▁▁▁▁▁▁▁▁▁▁
wandb:                                         To_team_kl_loss ▁▂▃▃▃▄▅▄██
wandb:                                   Tp_team_cross_entropy ▁▁▁▁▁▁▁▁▁▁
wandb:                                      Tq_team_value_loss ▄█▁▁▁▁▁▁▁▁
wandb:                                      Tr_team_advantages ▇▇▆▄▇▆█▅▁▆
wandb:                                      Ts_team_actor_norm ▁▃▄▂█▂▄▅▇▆
wandb:                                     Tt_team_critic_norm █▂▁▁▁▁▁▁▁▁
wandb:                     agent0/average_episode_team_rewards ▁▁▁▁▁▁▁▁█▁
wandb:                  agent0/average_step_individual_rewards ▆▄▄▄▅▁▂▇█▃
wandb:     agent0/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent0/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent0/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent0/idv_policy_eval_team_catch_total_num ▁
wandb:    agent0/team_policy_eval_average_episode_team_rewards ▁
wandb: agent0/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent0/team_policy_eval_idv_catch_total_num ▁
wandb:            agent0/team_policy_eval_team_catch_total_num ▁
wandb:                     agent1/average_episode_team_rewards ▁▁▁▁▁▁▁▁█▁
wandb:                  agent1/average_step_individual_rewards ▂▁█▄▂▁▂▃▃▃
wandb:     agent1/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent1/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent1/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent1/idv_policy_eval_team_catch_total_num ▁
wandb:    agent1/team_policy_eval_average_episode_team_rewards ▁
wandb: agent1/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent1/team_policy_eval_idv_catch_total_num ▁
wandb:            agent1/team_policy_eval_team_catch_total_num ▁
wandb:                     agent2/average_episode_team_rewards ▁▁▁▁▁▁▁▁█▁
wandb:                  agent2/average_step_individual_rewards ▃█▂▂▇▃▄▁▄▃
wandb:     agent2/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent2/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent2/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent2/idv_policy_eval_team_catch_total_num ▁
wandb:    agent2/team_policy_eval_average_episode_team_rewards ▁
wandb: agent2/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent2/team_policy_eval_idv_catch_total_num ▁
wandb:            agent2/team_policy_eval_team_catch_total_num ▁
wandb:                     agent3/average_episode_team_rewards ▁▁▁▁▁▁▁▁█▁
wandb:                  agent3/average_step_individual_rewards █▂▄▂▃▅▁▁▁▃
wandb:     agent3/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent3/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent3/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent3/idv_policy_eval_team_catch_total_num ▁
wandb:    agent3/team_policy_eval_average_episode_team_rewards ▁
wandb: agent3/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent3/team_policy_eval_idv_catch_total_num ▁
wandb:            agent3/team_policy_eval_team_catch_total_num ▁
wandb:                     agent4/average_episode_team_rewards ▁▁▁▁▁▁▁▁█▁
wandb:                  agent4/average_step_individual_rewards ▂▄▃▃▃▃▃▁█▃
wandb:     agent4/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent4/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent4/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent4/idv_policy_eval_team_catch_total_num ▁
wandb:    agent4/team_policy_eval_average_episode_team_rewards ▁
wandb: agent4/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent4/team_policy_eval_idv_catch_total_num ▁
wandb:            agent4/team_policy_eval_team_catch_total_num ▁
wandb: 
wandb: Run summary:
wandb:                                       Aa_idv_actor_loss -0.30464
wandb:                                          Ab_policy_loss 0.01115
wandb:                                     Ac_idv_ppo_loss_abs 0.89174
wandb:                                         Ad_idv_ppo_prop 0.73833
wandb:                                                  Ae_eta 1.0
wandb:                                    Af_noclip_proportion 0.9517
wandb:                                    Ag_update_proportion 0.502
wandb:                                          Ah_update_loss 0.60897
wandb:                                         Ai_idv_epsilon' 2.99775
wandb:                                            Aj_idv_sigma 1.18207
wandb:              Ak_idv_clip(sigma, 1-epislon', 1+epislon') 1.10638
wandb:                                Al_idv_noclip_proportion 0.9901
wandb:                       Am_idv_(sigma*A)update_proportion 0.4805
wandb:                             An_idv_(sigma*A)update_loss -0.70694
wandb:                                     Ao_idv_entropy_prop 0.26167
wandb:                                         Ap_dist_entropy 3.16112
wandb:                                          Aq_idv_kl_prop 0.0
wandb:                                          Ar_idv_kl_coef 0.0
wandb:                                          As_idv_kl_loss 0.01818
wandb:                                    At_idv_cross_entropy 0.0
wandb:                                           Au_value_loss 0.3003
wandb:                                           Av_advantages -0.0
wandb:                                       Aw_idv_actor_norm 1.53232
wandb:                                      Ax_idv_critic_norm 2.87389
wandb:                                     Ba_idv_org_min_prop 0.264
wandb:                                     Bb_idv_org_max_prop 0.238
wandb:                                     Bc_idv_org_org_prop 0.0
wandb:                                     Bd_idv_new_min_prop 0.2212
wandb:                                     Be_idv_new_max_prop 0.2593
wandb:                                      Ta_team_actor_loss -0.30885
wandb:                                     Tb_team_policy_loss 0.00717
wandb:                                    Tc_team_ppo_loss_abs 0.82022
wandb:                                        Td_team_ppo_prop 0.72189
wandb:                                        Te_team_epsilon^ 0.2
wandb:                                          Tf_team_sigma^ 1.00265
wandb:          Tg_team_clip(sigma^, 1-epislon^', 1+epislon^') 0.99652
wandb:                               Th_team_noclip_proportion 0.7382
wandb:                     Ti_team_(sigma^*A)update_proportion 0.8629
wandb:                           Tj_team_(sigma^*A)update_loss -0.06385
wandb:                                    Tk_team_entropy_prop 0.27811
wandb:                                    Tl_team_dist_entropy 3.16339
wandb:                                         Tm_team_kl_prop 0.0
wandb:                                         Tn_team_kl_coef 1.0
wandb:                                         To_team_kl_loss 0.01588
wandb:                                   Tp_team_cross_entropy 0.0
wandb:                                      Tq_team_value_loss 0.08695
wandb:                                      Tr_team_advantages -0.0
wandb:                                      Ts_team_actor_norm 0.7144
wandb:                                     Tt_team_critic_norm 1.09469
wandb:                     agent0/average_episode_team_rewards 0.0
wandb:                  agent0/average_step_individual_rewards -0.04138
wandb:     agent0/idv_policy_eval_average_episode_team_rewards 2.5
wandb:  agent0/idv_policy_eval_average_step_individual_rewards -0.00903
wandb:              agent0/idv_policy_eval_idv_catch_total_num 3
wandb:             agent0/idv_policy_eval_team_catch_total_num 1
wandb:    agent0/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent0/team_policy_eval_average_step_individual_rewards 0.00483
wandb:             agent0/team_policy_eval_idv_catch_total_num 3
wandb:            agent0/team_policy_eval_team_catch_total_num 0
wandb:                     agent1/average_episode_team_rewards 0.0
wandb:                  agent1/average_step_individual_rewards -0.02963
wandb:     agent1/idv_policy_eval_average_episode_team_rewards 2.5
wandb:  agent1/idv_policy_eval_average_step_individual_rewards -0.0793
wandb:              agent1/idv_policy_eval_idv_catch_total_num 1
wandb:             agent1/idv_policy_eval_team_catch_total_num 1
wandb:    agent1/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent1/team_policy_eval_average_step_individual_rewards 0.23642
wandb:             agent1/team_policy_eval_idv_catch_total_num 12
wandb:            agent1/team_policy_eval_team_catch_total_num 0
wandb:                     agent2/average_episode_team_rewards 0.0
wandb:                  agent2/average_step_individual_rewards -0.05593
wandb:     agent2/idv_policy_eval_average_episode_team_rewards 2.5
wandb:  agent2/idv_policy_eval_average_step_individual_rewards -0.10264
wandb:              agent2/idv_policy_eval_idv_catch_total_num 0
wandb:             agent2/idv_policy_eval_team_catch_total_num 1
wandb:    agent2/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent2/team_policy_eval_average_step_individual_rewards 0.09049
wandb:             agent2/team_policy_eval_idv_catch_total_num 7
wandb:            agent2/team_policy_eval_team_catch_total_num 0
wandb:                     agent3/average_episode_team_rewards 0.0
wandb:                  agent3/average_step_individual_rewards 0.01152
wandb:     agent3/idv_policy_eval_average_episode_team_rewards 2.5
wandb:  agent3/idv_policy_eval_average_step_individual_rewards 0.00853
wandb:              agent3/idv_policy_eval_idv_catch_total_num 4
wandb:             agent3/idv_policy_eval_team_catch_total_num 1
wandb:    agent3/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent3/team_policy_eval_average_step_individual_rewards -0.0054
wandb:             agent3/team_policy_eval_idv_catch_total_num 2
wandb:            agent3/team_policy_eval_team_catch_total_num 0
wandb:                     agent4/average_episode_team_rewards 0.0
wandb:                  agent4/average_step_individual_rewards -0.00935
wandb:     agent4/idv_policy_eval_average_episode_team_rewards 2.5
wandb:  agent4/idv_policy_eval_average_step_individual_rewards -0.02919
wandb:              agent4/idv_policy_eval_idv_catch_total_num 2
wandb:             agent4/idv_policy_eval_team_catch_total_num 1
wandb:    agent4/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent4/team_policy_eval_average_step_individual_rewards -0.01687
wandb:             agent4/team_policy_eval_idv_catch_total_num 3
wandb:            agent4/team_policy_eval_team_catch_total_num 0
wandb: 
wandb: 🚀 View run MPE_333 at: https://wandb.ai/804703098/Continue_Tag_Base_v1/runs/cwhpsmm1/workspace
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 4 other file(s)
wandb: Find logs at: ./results/MPE/simple_tag_tr/rmappotrsyn/exp_train_continue_tag_base_klcp_s2r2_v1/wandb/run-20240402_151554-cwhpsmm1/logs
Traceback (most recent call last):
  File "train/train_mpe_trsyn.py", line 244, in <module>
    main(sys.argv[1:])
  File "train/train_mpe_trsyn.py", line 229, in main
    runner.run()
  File "/home/user/zhangyang/PycharmProjects/Nips2024-ITPC-v2/Nips2024-ITPC-v2/onpolicy/runner/shared/mpe_runner_trsyn.py", line 118, in run
    d = {"average_team_rewards_" + str(self.all_args.seed) + "_KL_loss_" + self.all_args.idv_use_kl_loss: average_team_rewards}
TypeError: can only concatenate str (not "bool") to str
