wandb: Currently logged in as: 804703098. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.16.5
wandb: Run data is saved locally in /home/user/zhangyang/PycharmProjects/Nips2024-ITPC-v2/Nips2024-ITPC-v2/onpolicy/scripts/results/MPE/simple_tag_tr/rmappotrsyn/exp_train_continue_tag_base_klcp_s2r2_v1/wandb/run-20240402_151534-vhab5493
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run MPE_1
wandb: ⭐️ View project at https://wandb.ai/804703098/Continue_Tag_Base_v1
wandb: 🚀 View run at https://wandb.ai/804703098/Continue_Tag_Base_v1/runs/vhab5493/workspace
choose to use gpu...
idv policy and team policy use same initial params!

 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 0/10 episodes, total num timesteps 200/2000, FPS 243.

team_policy eval average step individual rewards of agent0: 0.03529775670904867
team_policy eval average team episode rewards of agent0: 0.0
team_policy eval idv catch total num of agent0: 4
team_policy eval team catch total num: 0
team_policy eval average step individual rewards of agent1: 0.03646768410161049
team_policy eval average team episode rewards of agent1: 0.0
team_policy eval idv catch total num of agent1: 5
team_policy eval team catch total num: 0
team_policy eval average step individual rewards of agent2: 0.11134021871327639
team_policy eval average team episode rewards of agent2: 0.0
team_policy eval idv catch total num of agent2: 8
team_policy eval team catch total num: 0
team_policy eval average step individual rewards of agent3: -0.029929121481642965
team_policy eval average team episode rewards of agent3: 0.0
team_policy eval idv catch total num of agent3: 2
team_policy eval team catch total num: 0
team_policy eval average step individual rewards of agent4: -0.05005451960224046
team_policy eval average team episode rewards of agent4: 0.0
team_policy eval idv catch total num of agent4: 1
team_policy eval team catch total num: 0
idv_policy eval average step individual rewards of agent0: -0.11212452970641168
idv_policy eval average team episode rewards of agent0: 0.0
idv_policy eval idv catch total num of agent0: 0
idv_policy eval team catch total num: 0
idv_policy eval average step individual rewards of agent1: -0.08209839840054235
idv_policy eval average team episode rewards of agent1: 0.0
idv_policy eval idv catch total num of agent1: 0
idv_policy eval team catch total num: 0
idv_policy eval average step individual rewards of agent2: -0.032594040305114126
idv_policy eval average team episode rewards of agent2: 0.0
idv_policy eval idv catch total num of agent2: 3
idv_policy eval team catch total num: 0
idv_policy eval average step individual rewards of agent3: -0.05591139243081258
idv_policy eval average team episode rewards of agent3: 0.0
idv_policy eval idv catch total num of agent3: 1
idv_policy eval team catch total num: 0
idv_policy eval average step individual rewards of agent4: 0.0016484195037158767
idv_policy eval average team episode rewards of agent4: 0.0
idv_policy eval idv catch total num of agent4: 3
idv_policy eval team catch total num: 0

 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 1/10 episodes, total num timesteps 400/2000, FPS 233.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 2/10 episodes, total num timesteps 600/2000, FPS 256.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 3/10 episodes, total num timesteps 800/2000, FPS 265.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 4/10 episodes, total num timesteps 1000/2000, FPS 276.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 5/10 episodes, total num timesteps 1200/2000, FPS 282.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 6/10 episodes, total num timesteps 1400/2000, FPS 285.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 7/10 episodes, total num timesteps 1600/2000, FPS 285.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 8/10 episodes, total num timesteps 1800/2000, FPS 287.


 Scenario simple_tag_tr Algo rmappotrsyn Exp exp_train_continue_tag_base_klcp_s2r2_v1 updates 9/10 episodes, total num timesteps 2000/2000, FPS 288.

wandb: - 0.006 MB of 0.006 MB uploaded
wandb: \ 0.650 MB of 0.650 MB uploaded
wandb: 
wandb: Run history:
wandb:                                       Aa_idv_actor_loss ▅▇▆▅▄▁▁██▃
wandb:                                          Ab_policy_loss ▃▄▅▃▃▁▁██▄
wandb:                                     Ac_idv_ppo_loss_abs ▁▆▆▆▇▇▆▆█▇
wandb:                                         Ad_idv_ppo_prop ▁▇▇▇▇▇▆▇█▇
wandb:                                                  Ae_eta ▅▅▄▆▅█▃▄▁▆
wandb:                                    Af_noclip_proportion ▇▇██▇▅▆▇▁▅
wandb:                                    Ag_update_proportion ▂█▆▄▅▁▆▅▆▂
wandb:                                          Ah_update_loss ▁▇████▇█▇▇
wandb:                                         Ai_idv_epsilon' █▇▆▆▅▄▃▃▂▁
wandb:                                            Aj_idv_sigma ▁▂▃▃▄▄▃▅█▆
wandb:              Ak_idv_clip(sigma, 1-epislon', 1+epislon') ▁▂▃▃▄▅▄▆█▇
wandb:                                Al_idv_noclip_proportion ██▆▆▃▂▃▃▁▅
wandb:                       Am_idv_(sigma*A)update_proportion █▁▄▆▄█▃▅▂▇
wandb:                             An_idv_(sigma*A)update_loss █▃▂▂▂▂▃▂▁▂
wandb:                                     Ao_idv_entropy_prop █▂▂▂▂▂▃▂▁▂
wandb:                                         Ap_dist_entropy ▂▁▄▂▃▅▆▅▅█
wandb:                                          Aq_idv_kl_prop ▁▁▁▁▁▁▁▁▁▁
wandb:                                          Ar_idv_kl_coef ▁▁▁▁▁▁▁▁▁▁
wandb:                                          As_idv_kl_loss ▁▂▃▃▄▃▄▆█▇
wandb:                                    At_idv_cross_entropy ▁▁▁▁▁▁▁▁▁▁
wandb:                                           Au_value_loss ██▁▁▁▁▁▁▁▁
wandb:                                           Av_advantages ▃▃▄▄▅█▄▁▄▆
wandb:                                       Aw_idv_actor_norm ▃▃▂▁▅▅▁▃█▆
wandb:                                      Ax_idv_critic_norm █▂▁▁▁▁▁▁▁▁
wandb:                                     Ba_idv_org_min_prop ▁▂▂▆▄▁▃▆▇█
wandb:                                     Bb_idv_org_max_prop ▆██▄▅▆▇▃▃▁
wandb:                                     Bc_idv_org_org_prop ▁▁▁▁▁▁▁▁▁▁
wandb:                                     Bd_idv_new_min_prop ▁▆▅▆▇▇▆▇▇█
wandb:                                     Be_idv_new_max_prop █▂▄▃▂▂▃▂▁▂
wandb:                                      Ta_team_actor_loss ▇███▅▇▆▁▆▁
wandb:                                     Tb_team_policy_loss ▁▂▃▄▃▄▅▄█▇
wandb:                                    Tc_team_ppo_loss_abs ▂▆▂▇▄▁▃▄▇█
wandb:                                        Td_team_ppo_prop ▃▇▃█▅▁▃▃▇▇
wandb:                                        Te_team_epsilon^ ▁▁▁▁▁▁▁▁▁▁
wandb:                                          Tf_team_sigma^ ▃▃▂▄▃▃▄▄█▁
wandb:          Tg_team_clip(sigma^, 1-epislon^', 1+epislon^') █▇▇▆▅▆▅▂▂▁
wandb:                               Th_team_noclip_proportion █▇▅▄▃▃▄▁▁▂
wandb:                     Ti_team_(sigma^*A)update_proportion █▇▆▅▄▄▅▁▁▂
wandb:                           Tj_team_(sigma^*A)update_loss ▇▆█▇▁▄▆▇▄█
wandb:                                    Tk_team_entropy_prop ▆▂▆▁▄█▆▆▂▂
wandb:                                    Tl_team_dist_entropy ▁▁▂▂▃▃▄▆▆█
wandb:                                         Tm_team_kl_prop ▁▁▁▁▁▁▁▁▁▁
wandb:                                         Tn_team_kl_coef ▁▁▁▁▁▁▁▁▁▁
wandb:                                         To_team_kl_loss ▁▂▂▃▄▄▃▆█▆
wandb:                                   Tp_team_cross_entropy ▁▁▁▁▁▁▁▁▁▁
wandb:                                      Tq_team_value_loss ██▁▁▁▁▁▁▁▁
wandb:                                      Tr_team_advantages █▃▅▇█▁▅▅▅▁
wandb:                                      Ts_team_actor_norm ▁▂▃▂▇▆▆▅█▄
wandb:                                     Tt_team_critic_norm █▁▁▁▁▁▁▁▁▁
wandb:                     agent0/average_episode_team_rewards ▁▁▁█▁▁▁▁▁▁
wandb:                  agent0/average_step_individual_rewards ▁▄▅█▂▄▆▄█▅
wandb:     agent0/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent0/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent0/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent0/idv_policy_eval_team_catch_total_num ▁
wandb:    agent0/team_policy_eval_average_episode_team_rewards ▁
wandb: agent0/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent0/team_policy_eval_idv_catch_total_num ▁
wandb:            agent0/team_policy_eval_team_catch_total_num ▁
wandb:                     agent1/average_episode_team_rewards ▁▁▁█▁▁▁▁▁▁
wandb:                  agent1/average_step_individual_rewards ▃▃▃█▃▃▁▂▃▅
wandb:     agent1/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent1/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent1/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent1/idv_policy_eval_team_catch_total_num ▁
wandb:    agent1/team_policy_eval_average_episode_team_rewards ▁
wandb: agent1/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent1/team_policy_eval_idv_catch_total_num ▁
wandb:            agent1/team_policy_eval_team_catch_total_num ▁
wandb:                     agent2/average_episode_team_rewards ▁▁▁█▁▁▁▁▁▁
wandb:                  agent2/average_step_individual_rewards ▁▄▂█▃▂▁▇▃▄
wandb:     agent2/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent2/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent2/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent2/idv_policy_eval_team_catch_total_num ▁
wandb:    agent2/team_policy_eval_average_episode_team_rewards ▁
wandb: agent2/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent2/team_policy_eval_idv_catch_total_num ▁
wandb:            agent2/team_policy_eval_team_catch_total_num ▁
wandb:                     agent3/average_episode_team_rewards ▁▁▁█▁▁▁▁▁▁
wandb:                  agent3/average_step_individual_rewards ▃▁▂▂█▂▃▅▄▆
wandb:     agent3/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent3/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent3/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent3/idv_policy_eval_team_catch_total_num ▁
wandb:    agent3/team_policy_eval_average_episode_team_rewards ▁
wandb: agent3/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent3/team_policy_eval_idv_catch_total_num ▁
wandb:            agent3/team_policy_eval_team_catch_total_num ▁
wandb:                     agent4/average_episode_team_rewards ▁▁▁█▁▁▁▁▁▁
wandb:                  agent4/average_step_individual_rewards ▄▂▂▅▂▁▇▂▇█
wandb:     agent4/idv_policy_eval_average_episode_team_rewards ▁
wandb:  agent4/idv_policy_eval_average_step_individual_rewards ▁
wandb:              agent4/idv_policy_eval_idv_catch_total_num ▁
wandb:             agent4/idv_policy_eval_team_catch_total_num ▁
wandb:    agent4/team_policy_eval_average_episode_team_rewards ▁
wandb: agent4/team_policy_eval_average_step_individual_rewards ▁
wandb:             agent4/team_policy_eval_idv_catch_total_num ▁
wandb:            agent4/team_policy_eval_team_catch_total_num ▁
wandb: 
wandb: Run summary:
wandb:                                       Aa_idv_actor_loss -0.31433
wandb:                                          Ab_policy_loss -0.00237
wandb:                                     Ac_idv_ppo_loss_abs 0.91095
wandb:                                         Ad_idv_ppo_prop 0.74488
wandb:                                                  Ae_eta 1.00232
wandb:                                    Af_noclip_proportion 0.9556
wandb:                                    Ag_update_proportion 0.4811
wandb:                                          Ah_update_loss 0.63667
wandb:                                         Ai_idv_epsilon' 2.99775
wandb:                                            Aj_idv_sigma 1.16702
wandb:              Ak_idv_clip(sigma, 1-epislon', 1+epislon') 1.13828
wandb:                                Al_idv_noclip_proportion 0.9963
wandb:                       Am_idv_(sigma*A)update_proportion 0.5067
wandb:                             An_idv_(sigma*A)update_loss -0.717
wandb:                                     Ao_idv_entropy_prop 0.25512
wandb:                                         Ap_dist_entropy 3.1228
wandb:                                          Aq_idv_kl_prop 0.0
wandb:                                          Ar_idv_kl_coef 0.0
wandb:                                          As_idv_kl_loss 0.05383
wandb:                                    At_idv_cross_entropy 0.0
wandb:                                           Au_value_loss 0.14617
wandb:                                           Av_advantages 0.0
wandb:                                       Aw_idv_actor_norm 1.02293
wandb:                                      Ax_idv_critic_norm 1.7848
wandb:                                     Ba_idv_org_min_prop 0.3418
wandb:                                     Bb_idv_org_max_prop 0.1393
wandb:                                     Bc_idv_org_org_prop 0.0
wandb:                                     Bd_idv_new_min_prop 0.2904
wandb:                                     Be_idv_new_max_prop 0.2163
wandb:                                      Ta_team_actor_loss -0.31434
wandb:                                     Tb_team_policy_loss 0.0212
wandb:                                    Tc_team_ppo_loss_abs 0.30601
wandb:                                        Td_team_ppo_prop 0.47698
wandb:                                        Te_team_epsilon^ 0.2
wandb:                                          Tf_team_sigma^ 0.98414
wandb:          Tg_team_clip(sigma^, 1-epislon^', 1+epislon^') 0.94893
wandb:                               Th_team_noclip_proportion 0.5756
wandb:                     Ti_team_(sigma^*A)update_proportion 0.7175
wandb:                           Tj_team_(sigma^*A)update_loss 0.00147
wandb:                                    Tk_team_entropy_prop 0.52302
wandb:                                    Tl_team_dist_entropy 3.35888
wandb:                                         Tm_team_kl_prop 0.0
wandb:                                         Tn_team_kl_coef 1.0
wandb:                                         To_team_kl_loss 0.03878
wandb:                                   Tp_team_cross_entropy 0.0
wandb:                                      Tq_team_value_loss 0.05451
wandb:                                      Tr_team_advantages -0.0
wandb:                                      Ts_team_actor_norm 0.49732
wandb:                                     Tt_team_critic_norm 0.81772
wandb:                     agent0/average_episode_team_rewards 0.0
wandb:                  agent0/average_step_individual_rewards 0.03702
wandb:     agent0/idv_policy_eval_average_episode_team_rewards 0.0
wandb:  agent0/idv_policy_eval_average_step_individual_rewards -0.11212
wandb:              agent0/idv_policy_eval_idv_catch_total_num 0
wandb:             agent0/idv_policy_eval_team_catch_total_num 0
wandb:    agent0/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent0/team_policy_eval_average_step_individual_rewards 0.0353
wandb:             agent0/team_policy_eval_idv_catch_total_num 4
wandb:            agent0/team_policy_eval_team_catch_total_num 0
wandb:                     agent1/average_episode_team_rewards 0.0
wandb:                  agent1/average_step_individual_rewards 0.06608
wandb:     agent1/idv_policy_eval_average_episode_team_rewards 0.0
wandb:  agent1/idv_policy_eval_average_step_individual_rewards -0.0821
wandb:              agent1/idv_policy_eval_idv_catch_total_num 0
wandb:             agent1/idv_policy_eval_team_catch_total_num 0
wandb:    agent1/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent1/team_policy_eval_average_step_individual_rewards 0.03647
wandb:             agent1/team_policy_eval_idv_catch_total_num 5
wandb:            agent1/team_policy_eval_team_catch_total_num 0
wandb:                     agent2/average_episode_team_rewards 0.0
wandb:                  agent2/average_step_individual_rewards -0.01897
wandb:     agent2/idv_policy_eval_average_episode_team_rewards 0.0
wandb:  agent2/idv_policy_eval_average_step_individual_rewards -0.03259
wandb:              agent2/idv_policy_eval_idv_catch_total_num 3
wandb:             agent2/idv_policy_eval_team_catch_total_num 0
wandb:    agent2/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent2/team_policy_eval_average_step_individual_rewards 0.11134
wandb:             agent2/team_policy_eval_idv_catch_total_num 8
wandb:            agent2/team_policy_eval_team_catch_total_num 0
wandb:                     agent3/average_episode_team_rewards 0.0
wandb:                  agent3/average_step_individual_rewards 0.06579
wandb:     agent3/idv_policy_eval_average_episode_team_rewards 0.0
wandb:  agent3/idv_policy_eval_average_step_individual_rewards -0.05591
wandb:              agent3/idv_policy_eval_idv_catch_total_num 1
wandb:             agent3/idv_policy_eval_team_catch_total_num 0
wandb:    agent3/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent3/team_policy_eval_average_step_individual_rewards -0.02993
wandb:             agent3/team_policy_eval_idv_catch_total_num 2
wandb:            agent3/team_policy_eval_team_catch_total_num 0
wandb:                     agent4/average_episode_team_rewards 0.0
wandb:                  agent4/average_step_individual_rewards 0.06292
wandb:     agent4/idv_policy_eval_average_episode_team_rewards 0.0
wandb:  agent4/idv_policy_eval_average_step_individual_rewards 0.00165
wandb:              agent4/idv_policy_eval_idv_catch_total_num 3
wandb:             agent4/idv_policy_eval_team_catch_total_num 0
wandb:    agent4/team_policy_eval_average_episode_team_rewards 0.0
wandb: agent4/team_policy_eval_average_step_individual_rewards -0.05005
wandb:             agent4/team_policy_eval_idv_catch_total_num 1
wandb:            agent4/team_policy_eval_team_catch_total_num 0
wandb: 
wandb: 🚀 View run MPE_1 at: https://wandb.ai/804703098/Continue_Tag_Base_v1/runs/vhab5493/workspace
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 4 other file(s)
wandb: Find logs at: ./results/MPE/simple_tag_tr/rmappotrsyn/exp_train_continue_tag_base_klcp_s2r2_v1/wandb/run-20240402_151534-vhab5493/logs
Traceback (most recent call last):
  File "train/train_mpe_trsyn.py", line 244, in <module>
    main(sys.argv[1:])
  File "train/train_mpe_trsyn.py", line 229, in main
    runner.run()
  File "/home/user/zhangyang/PycharmProjects/Nips2024-ITPC-v2/Nips2024-ITPC-v2/onpolicy/runner/shared/mpe_runner_trsyn.py", line 118, in run
    d = {"average_team_rewards_" + str(self.all_args.seed) + "_KL_loss_" + self.all_args.idv_use_kl_loss: average_team_rewards}
TypeError: can only concatenate str (not "bool") to str
