Logging to experiments/invertedPendulum/nov2/IPO01w350e3_seed2231
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.8410069346427917
Validation loss = 0.7140445709228516
Validation loss = 0.6468721628189087
Validation loss = 0.6279726028442383
Validation loss = 0.6114574670791626
Validation loss = 0.5925952196121216
Validation loss = 0.5987734198570251
Validation loss = 0.5897319912910461
Validation loss = 0.5566436052322388
Validation loss = 0.5478923320770264
Validation loss = 0.5333073139190674
Validation loss = 0.537347674369812
Validation loss = 0.5530933141708374
Validation loss = 0.5233882069587708
Validation loss = 0.5299667716026306
Validation loss = 0.5302127599716187
Validation loss = 0.5736712217330933
Validation loss = 0.5486568212509155
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.8375306129455566
Validation loss = 0.7197250723838806
Validation loss = 0.6543489098548889
Validation loss = 0.6295110583305359
Validation loss = 0.6124050617218018
Validation loss = 0.6019819378852844
Validation loss = 0.5779719352722168
Validation loss = 0.5771917104721069
Validation loss = 0.5599179863929749
Validation loss = 0.5460910797119141
Validation loss = 0.5459673404693604
Validation loss = 0.5287884473800659
Validation loss = 0.5228350758552551
Validation loss = 0.5201936960220337
Validation loss = 0.521621823310852
Validation loss = 0.5132372379302979
Validation loss = 0.5083727836608887
Validation loss = 0.517405092716217
Validation loss = 0.5171782970428467
Validation loss = 0.5289379358291626
Validation loss = 0.49774184823036194
Validation loss = 0.5125580430030823
Validation loss = 0.49743956327438354
Validation loss = 0.5061745047569275
Validation loss = 0.5159248113632202
Validation loss = 0.509726345539093
Validation loss = 0.4838004410266876
Validation loss = 0.48944026231765747
Validation loss = 0.48997294902801514
Validation loss = 0.4961009621620178
Validation loss = 0.4838230013847351
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.8362807035446167
Validation loss = 0.7014708518981934
Validation loss = 0.6551313996315002
Validation loss = 0.6234367489814758
Validation loss = 0.6076383590698242
Validation loss = 0.5913510918617249
Validation loss = 0.5819637179374695
Validation loss = 0.56565260887146
Validation loss = 0.5609256625175476
Validation loss = 0.5595636963844299
Validation loss = 0.5296231508255005
Validation loss = 0.5329968333244324
Validation loss = 0.5581730008125305
Validation loss = 0.5347281098365784
Validation loss = 0.5444867014884949
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.8516337871551514
Validation loss = 0.7157714366912842
Validation loss = 0.6702171564102173
Validation loss = 0.6470255851745605
Validation loss = 0.6254081726074219
Validation loss = 0.6043092012405396
Validation loss = 0.5851443409919739
Validation loss = 0.5717436075210571
Validation loss = 0.5833799839019775
Validation loss = 0.548686146736145
Validation loss = 0.5336923003196716
Validation loss = 0.5584120750427246
Validation loss = 0.5538434982299805
Validation loss = 0.559844434261322
Validation loss = 0.5402188897132874
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.8361001014709473
Validation loss = 0.7227615118026733
Validation loss = 0.6624580025672913
Validation loss = 0.6515805721282959
Validation loss = 0.6196280717849731
Validation loss = 0.6236230731010437
Validation loss = 0.5836524963378906
Validation loss = 0.5791385769844055
Validation loss = 0.5636374950408936
Validation loss = 0.5492320656776428
Validation loss = 0.5562112331390381
Validation loss = 0.5402703881263733
Validation loss = 0.5256752371788025
Validation loss = 0.5319731831550598
Validation loss = 0.5411970615386963
Validation loss = 0.5275280475616455
Validation loss = 0.5207018852233887
Validation loss = 0.5215726494789124
Validation loss = 0.5205730199813843
Validation loss = 0.5152285099029541
Validation loss = 0.5143386125564575
Validation loss = 0.5148476362228394
Validation loss = 0.518108069896698
Validation loss = 0.5063337683677673
Validation loss = 0.5057046413421631
Validation loss = 0.49949729442596436
Validation loss = 0.5008326172828674
Validation loss = 0.5102755427360535
Validation loss = 0.5030717849731445
Validation loss = 0.5135324001312256
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -136     |
| Iteration     | 0        |
| MaximumReturn | -75.4    |
| MinimumReturn | -167     |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6684143543243408
Validation loss = 0.5747942328453064
Validation loss = 0.5341530442237854
Validation loss = 0.5198129415512085
Validation loss = 0.502516508102417
Validation loss = 0.5125699639320374
Validation loss = 0.5046445727348328
Validation loss = 0.5011900663375854
Validation loss = 0.5030666589736938
Validation loss = 0.49337801337242126
Validation loss = 0.4817739725112915
Validation loss = 0.48022621870040894
Validation loss = 0.49227115511894226
Validation loss = 0.4743519127368927
Validation loss = 0.4534018635749817
Validation loss = 0.4865419566631317
Validation loss = 0.46377837657928467
Validation loss = 0.47892090678215027
Validation loss = 0.4739166796207428
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.8140007257461548
Validation loss = 0.5814897418022156
Validation loss = 0.5524870753288269
Validation loss = 0.522528886795044
Validation loss = 0.5156513452529907
Validation loss = 0.5088255405426025
Validation loss = 0.5082744359970093
Validation loss = 0.5047276020050049
Validation loss = 0.4942176640033722
Validation loss = 0.5004751682281494
Validation loss = 0.5021347999572754
Validation loss = 0.4879242777824402
Validation loss = 0.4906829595565796
Validation loss = 0.47576895356178284
Validation loss = 0.4685914218425751
Validation loss = 0.4819263219833374
Validation loss = 0.46256181597709656
Validation loss = 0.45524120330810547
Validation loss = 0.47070571780204773
Validation loss = 0.4531354606151581
Validation loss = 0.46367698907852173
Validation loss = 0.4599512219429016
Validation loss = 0.46337273716926575
Validation loss = 0.44466477632522583
Validation loss = 0.4352756440639496
Validation loss = 0.4516177177429199
Validation loss = 0.4614280164241791
Validation loss = 0.42654314637184143
Validation loss = 0.4683413803577423
Validation loss = 0.4472123086452484
Validation loss = 0.44071778655052185
Validation loss = 0.46175846457481384
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6813116669654846
Validation loss = 0.570298433303833
Validation loss = 0.5386635065078735
Validation loss = 0.5246737599372864
Validation loss = 0.5131407380104065
Validation loss = 0.5029091835021973
Validation loss = 0.4988389015197754
Validation loss = 0.49615827202796936
Validation loss = 0.4899919331073761
Validation loss = 0.4868758022785187
Validation loss = 0.4803491532802582
Validation loss = 0.4777770936489105
Validation loss = 0.475375235080719
Validation loss = 0.4713113009929657
Validation loss = 0.47721123695373535
Validation loss = 0.4691678285598755
Validation loss = 0.472191721200943
Validation loss = 0.49068719148635864
Validation loss = 0.46591973304748535
Validation loss = 0.4698309600353241
Validation loss = 0.4996642470359802
Validation loss = 0.4677039384841919
Validation loss = 0.45213252305984497
Validation loss = 0.464523047208786
Validation loss = 0.4570515751838684
Validation loss = 0.44167768955230713
Validation loss = 0.46207916736602783
Validation loss = 0.48126691579818726
Validation loss = 0.4727252423763275
Validation loss = 0.4594409465789795
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.6904160380363464
Validation loss = 0.5783180594444275
Validation loss = 0.5489897727966309
Validation loss = 0.5478417277336121
Validation loss = 0.5061764121055603
Validation loss = 0.5067541003227234
Validation loss = 0.4969979226589203
Validation loss = 0.5013135075569153
Validation loss = 0.4981582462787628
Validation loss = 0.47874748706817627
Validation loss = 0.48796287178993225
Validation loss = 0.49036675691604614
Validation loss = 0.47895675897598267
Validation loss = 0.469921350479126
Validation loss = 0.4662741720676422
Validation loss = 0.46420708298683167
Validation loss = 0.47159966826438904
Validation loss = 0.4663833677768707
Validation loss = 0.4656428098678589
Validation loss = 0.4528105556964874
Validation loss = 0.44562312960624695
Validation loss = 0.45701131224632263
Validation loss = 0.4716632664203644
Validation loss = 0.44637078046798706
Validation loss = 0.45463335514068604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7543879151344299
Validation loss = 0.5828708410263062
Validation loss = 0.544206976890564
Validation loss = 0.5501485466957092
Validation loss = 0.5357178449630737
Validation loss = 0.5259759426116943
Validation loss = 0.5117083191871643
Validation loss = 0.50411456823349
Validation loss = 0.511364758014679
Validation loss = 0.4985366463661194
Validation loss = 0.5009123682975769
Validation loss = 0.5020121932029724
Validation loss = 0.5053524374961853
Validation loss = 0.4716527462005615
Validation loss = 0.46896296739578247
Validation loss = 0.47248709201812744
Validation loss = 0.4621986150741577
Validation loss = 0.4713810086250305
Validation loss = 0.45601630210876465
Validation loss = 0.4652993083000183
Validation loss = 0.47506582736968994
Validation loss = 0.45951324701309204
Validation loss = 0.43736493587493896
Validation loss = 0.46431246399879456
Validation loss = 0.4582490921020508
Validation loss = 0.4637272357940674
Validation loss = 0.46461114287376404
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.0196078431372549
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.038461538461538464
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.05660377358490566
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.07407407407407407
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.12727272727272726
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.16071428571428573
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.17543859649122806
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.1896551724137931
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.22033898305084745
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.23333333333333334
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.26229508196721313
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.27419354838709675
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.30158730158730157
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3125
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3230769230769231
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.3484848484848485
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.3582089552238806
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.38235294117647056
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4057971014492754
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.42857142857142855
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4507042253521127
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4583333333333333
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4657534246575342
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47297297297297297
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5066666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -128     |
| Iteration     | 1        |
| MaximumReturn | -66.1    |
| MinimumReturn | -162     |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5845643281936646
Validation loss = 0.48867595195770264
Validation loss = 0.4694831371307373
Validation loss = 0.47598138451576233
Validation loss = 0.47229206562042236
Validation loss = 0.46680641174316406
Validation loss = 0.47168731689453125
Validation loss = 0.4765743017196655
Validation loss = 0.48354437947273254
Validation loss = 0.4696965217590332
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5879331827163696
Validation loss = 0.4778762757778168
Validation loss = 0.4649944305419922
Validation loss = 0.4671896696090698
Validation loss = 0.4574764370918274
Validation loss = 0.45828768610954285
Validation loss = 0.4568948447704315
Validation loss = 0.45798537135124207
Validation loss = 0.46883514523506165
Validation loss = 0.4659709334373474
Validation loss = 0.47112882137298584
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6019400358200073
Validation loss = 0.48068463802337646
Validation loss = 0.465516060590744
Validation loss = 0.4624136686325073
Validation loss = 0.47777625918388367
Validation loss = 0.4728987216949463
Validation loss = 0.46868178248405457
Validation loss = 0.4765101373195648
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.579481303691864
Validation loss = 0.47050827741622925
Validation loss = 0.469261109828949
Validation loss = 0.46386003494262695
Validation loss = 0.4617736339569092
Validation loss = 0.45516443252563477
Validation loss = 0.47411710023880005
Validation loss = 0.4671823978424072
Validation loss = 0.4831061065196991
Validation loss = 0.4763997197151184
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5649501085281372
Validation loss = 0.48080509901046753
Validation loss = 0.46601223945617676
Validation loss = 0.4821646809577942
Validation loss = 0.4650554060935974
Validation loss = 0.463653564453125
Validation loss = 0.48282569646835327
Validation loss = 0.4733303189277649
Validation loss = 0.48339733481407166
Validation loss = 0.4656411409378052
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5263157894736842
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5324675324675324
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5256410256410257
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5189873417721519
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5125
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5185185185185185
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5121951219512195
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5060240963855421
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49411764705882355
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4942528735632184
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48863636363636365
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48314606741573035
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4777777777777778
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4725274725274725
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4673913043478261
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46236559139784944
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.46808510638297873
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47368421052631576
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4791666666666667
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4948453608247423
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5102040816326531
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5050505050505051
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -83      |
| Iteration     | 2        |
| MaximumReturn | -0.78    |
| MinimumReturn | -176     |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5206802487373352
Validation loss = 0.44734761118888855
Validation loss = 0.4335271120071411
Validation loss = 0.44613519310951233
Validation loss = 0.4326126277446747
Validation loss = 0.42838624119758606
Validation loss = 0.4398244321346283
Validation loss = 0.4357479512691498
Validation loss = 0.4309622049331665
Validation loss = 0.44111594557762146
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5390432476997375
Validation loss = 0.43824395537376404
Validation loss = 0.43447378277778625
Validation loss = 0.4328247308731079
Validation loss = 0.4341147243976593
Validation loss = 0.4290759563446045
Validation loss = 0.42840731143951416
Validation loss = 0.4356609880924225
Validation loss = 0.4356067180633545
Validation loss = 0.4294113218784332
Validation loss = 0.42496779561042786
Validation loss = 0.4279750883579254
Validation loss = 0.43836864829063416
Validation loss = 0.4393210709095001
Validation loss = 0.44227418303489685
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5062880516052246
Validation loss = 0.44869399070739746
Validation loss = 0.43637657165527344
Validation loss = 0.4394507110118866
Validation loss = 0.43167009949684143
Validation loss = 0.4329775869846344
Validation loss = 0.4477609097957611
Validation loss = 0.44492030143737793
Validation loss = 0.4264545738697052
Validation loss = 0.4354289472103119
Validation loss = 0.4423488676548004
Validation loss = 0.44201207160949707
Validation loss = 0.43353474140167236
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5272703170776367
Validation loss = 0.4395826756954193
Validation loss = 0.43023881316185
Validation loss = 0.4495619237422943
Validation loss = 0.43706217408180237
Validation loss = 0.42948588728904724
Validation loss = 0.4442652761936188
Validation loss = 0.4260736405849457
Validation loss = 0.4410509169101715
Validation loss = 0.42918458580970764
Validation loss = 0.4373093843460083
Validation loss = 0.4384739398956299
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5647512078285217
Validation loss = 0.4466742277145386
Validation loss = 0.43400025367736816
Validation loss = 0.43375054001808167
Validation loss = 0.4356434643268585
Validation loss = 0.42943310737609863
Validation loss = 0.4289380609989166
Validation loss = 0.43841230869293213
Validation loss = 0.4282163381576538
Validation loss = 0.4312051832675934
Validation loss = 0.43508467078208923
Validation loss = 0.4353446662425995
Validation loss = 0.4424358606338501
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49504950495049505
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5048543689320388
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5047619047619047
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5094339622641509
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.514018691588785
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5185185185185185
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5229357798165137
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5272727272727272
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5315315315315315
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5357142857142857
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5398230088495575
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.543859649122807
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5478260869565217
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5517241379310345
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5555555555555556
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.559322033898305
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5546218487394958
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5583333333333333
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5619834710743802
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5655737704918032
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5691056910569106
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5725806451612904
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.576
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -138     |
| Iteration     | 3        |
| MaximumReturn | -87.4    |
| MinimumReturn | -161     |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5522644519805908
Validation loss = 0.4303678572177887
Validation loss = 0.42566922307014465
Validation loss = 0.41294071078300476
Validation loss = 0.40055984258651733
Validation loss = 0.40830478072166443
Validation loss = 0.4064316153526306
Validation loss = 0.4209787845611572
Validation loss = 0.4179629683494568
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5263344049453735
Validation loss = 0.4169054925441742
Validation loss = 0.4120709300041199
Validation loss = 0.4175225496292114
Validation loss = 0.41184473037719727
Validation loss = 0.41786718368530273
Validation loss = 0.409432977437973
Validation loss = 0.4275311827659607
Validation loss = 0.4209763705730438
Validation loss = 0.4188348352909088
Validation loss = 0.4269350469112396
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5346487760543823
Validation loss = 0.45157119631767273
Validation loss = 0.4239981770515442
Validation loss = 0.4224286675453186
Validation loss = 0.41730087995529175
Validation loss = 0.42328158020973206
Validation loss = 0.42154303193092346
Validation loss = 0.41945892572402954
Validation loss = 0.43123865127563477
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5456398725509644
Validation loss = 0.4332526922225952
Validation loss = 0.4287044405937195
Validation loss = 0.41390883922576904
Validation loss = 0.42397502064704895
Validation loss = 0.40443000197410583
Validation loss = 0.4143683612346649
Validation loss = 0.4266209602355957
Validation loss = 0.41421252489089966
Validation loss = 0.42244473099708557
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5333766937255859
Validation loss = 0.43275928497314453
Validation loss = 0.4116854667663574
Validation loss = 0.4181194305419922
Validation loss = 0.41497892141342163
Validation loss = 0.4117164611816406
Validation loss = 0.40351149439811707
Validation loss = 0.4028518497943878
Validation loss = 0.41756486892700195
Validation loss = 0.41674381494522095
Validation loss = 0.4112563133239746
Validation loss = 0.4183453321456909
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5873015873015873
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5826771653543307
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5859375
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5891472868217055
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5923076923076923
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5954198473282443
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5909090909090909
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5939849624060151
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5970149253731343
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6029411764705882
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6131386861313869
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6159420289855072
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6187050359712231
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6214285714285714
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.624113475177305
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6267605633802817
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6293706293706294
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6319444444444444
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6275862068965518
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6301369863013698
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6326530612244898
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6283783783783784
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6308724832214765
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6333333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -67.3    |
| Iteration     | 4        |
| MaximumReturn | -0.496   |
| MinimumReturn | -129     |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4656493067741394
Validation loss = 0.3982637822628021
Validation loss = 0.3983706831932068
Validation loss = 0.41351261734962463
Validation loss = 0.402818500995636
Validation loss = 0.4012070298194885
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.48371171951293945
Validation loss = 0.39914458990097046
Validation loss = 0.4085417687892914
Validation loss = 0.4059232771396637
Validation loss = 0.4106934666633606
Validation loss = 0.41610804200172424
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.45267415046691895
Validation loss = 0.4109315872192383
Validation loss = 0.4128510057926178
Validation loss = 0.413152277469635
Validation loss = 0.40614646673202515
Validation loss = 0.4092671275138855
Validation loss = 0.4180254340171814
Validation loss = 0.4160410761833191
Validation loss = 0.43241381645202637
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4607180058956146
Validation loss = 0.40636810660362244
Validation loss = 0.40850916504859924
Validation loss = 0.4135293960571289
Validation loss = 0.4121796488761902
Validation loss = 0.4068174958229065
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.487241268157959
Validation loss = 0.40594181418418884
Validation loss = 0.407993882894516
Validation loss = 0.39971357583999634
Validation loss = 0.41525277495384216
Validation loss = 0.4092160761356354
Validation loss = 0.41514891386032104
Validation loss = 0.41291776299476624
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6357615894039735
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.631578947368421
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6274509803921569
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6233766233766234
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6193548387096774
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6153846153846154
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6178343949044586
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6139240506329114
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.610062893081761
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6125
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6149068322981367
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6111111111111112
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6073619631901841
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6097560975609756
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6060606060606061
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6024096385542169
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5988023952095808
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5952380952380952
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5976331360946746
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6023391812865497
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5988372093023255
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5953757225433526
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5919540229885057
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5885714285714285
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -70.8    |
| Iteration     | 5        |
| MaximumReturn | -0.338   |
| MinimumReturn | -168     |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4140613079071045
Validation loss = 0.401444673538208
Validation loss = 0.40453845262527466
Validation loss = 0.4000628590583801
Validation loss = 0.4065023362636566
Validation loss = 0.42263731360435486
Validation loss = 0.4095988869667053
Validation loss = 0.40198636054992676
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42488422989845276
Validation loss = 0.41301098465919495
Validation loss = 0.4075492024421692
Validation loss = 0.41169866919517517
Validation loss = 0.4163430333137512
Validation loss = 0.41223639249801636
Validation loss = 0.4243766665458679
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.43838420510292053
Validation loss = 0.4112939238548279
Validation loss = 0.410352885723114
Validation loss = 0.4149383008480072
Validation loss = 0.4213939309120178
Validation loss = 0.4237847924232483
Validation loss = 0.4211832880973816
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4227530360221863
Validation loss = 0.4115835130214691
Validation loss = 0.41136497259140015
Validation loss = 0.4058637022972107
Validation loss = 0.4060189127922058
Validation loss = 0.4163725972175598
Validation loss = 0.41774457693099976
Validation loss = 0.422333300113678
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4381214678287506
Validation loss = 0.4103280007839203
Validation loss = 0.41809242963790894
Validation loss = 0.41515812277793884
Validation loss = 0.4135722219944
Validation loss = 0.4165080189704895
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5965909090909091
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5932203389830508
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5898876404494382
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5865921787709497
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5833333333333334
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.580110497237569
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5824175824175825
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5792349726775956
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5760869565217391
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5783783783783784
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5806451612903226
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5775401069518716
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5797872340425532
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5767195767195767
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5736842105263158
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5863874345549738
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5885416666666666
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5854922279792746
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5927835051546392
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5948717948717949
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5969387755102041
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5939086294416244
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5959595959595959
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5979899497487438
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.595
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -15.2    |
| Iteration     | 6        |
| MaximumReturn | -0.118   |
| MinimumReturn | -62.4    |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4210469722747803
Validation loss = 0.40425530076026917
Validation loss = 0.4057495594024658
Validation loss = 0.4083939790725708
Validation loss = 0.40797197818756104
Validation loss = 0.4066901206970215
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42711982131004333
Validation loss = 0.41124093532562256
Validation loss = 0.4163210391998291
Validation loss = 0.41751059889793396
Validation loss = 0.41324183344841003
Validation loss = 0.4182378351688385
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4311164915561676
Validation loss = 0.41997966170310974
Validation loss = 0.4290941059589386
Validation loss = 0.424345463514328
Validation loss = 0.4272030293941498
Validation loss = 0.4290705919265747
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4135807454586029
Validation loss = 0.41379085183143616
Validation loss = 0.414709210395813
Validation loss = 0.42096784710884094
Validation loss = 0.42384400963783264
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.41845521330833435
Validation loss = 0.40589383244514465
Validation loss = 0.4042096436023712
Validation loss = 0.42059651017189026
Validation loss = 0.423298716545105
Validation loss = 0.4166651964187622
Validation loss = 0.41758671402931213
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5920398009950248
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.594059405940594
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5911330049261084
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5882352941176471
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5853658536585366
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5825242718446602
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5797101449275363
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5769230769230769
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5741626794258373
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5761904761904761
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5734597156398105
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5754716981132075
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5727699530516432
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5747663551401869
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5813953488372093
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5833333333333334
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5852534562211982
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.591743119266055
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.593607305936073
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5954545454545455
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5927601809954751
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5945945945945946
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5919282511210763
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.59375
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5911111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -47.9    |
| Iteration     | 7        |
| MaximumReturn | -0.645   |
| MinimumReturn | -87.7    |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.406503289937973
Validation loss = 0.4015503227710724
Validation loss = 0.41762852668762207
Validation loss = 0.4074084162712097
Validation loss = 0.4151262938976288
Validation loss = 0.4053443372249603
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4159735143184662
Validation loss = 0.40576645731925964
Validation loss = 0.4066286087036133
Validation loss = 0.4138656258583069
Validation loss = 0.4144885838031769
Validation loss = 0.42060190439224243
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.423720121383667
Validation loss = 0.4133995473384857
Validation loss = 0.4203750789165497
Validation loss = 0.4159674644470215
Validation loss = 0.4154300391674042
Validation loss = 0.4332747757434845
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4158315062522888
Validation loss = 0.4079310894012451
Validation loss = 0.4133619964122772
Validation loss = 0.4107077121734619
Validation loss = 0.4150323271751404
Validation loss = 0.4147915244102478
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.41850534081459045
Validation loss = 0.4094505310058594
Validation loss = 0.41217952966690063
Validation loss = 0.4218275845050812
Validation loss = 0.4142870604991913
Validation loss = 0.41425973176956177
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.588495575221239
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5859030837004405
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5833333333333334
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5807860262008734
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5826086956521739
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5800865800865801
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5862068965517241
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5836909871244635
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5811965811965812
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5914893617021276
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6016949152542372
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6033755274261603
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6134453781512605
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6108786610878661
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6083333333333333
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6099585062240664
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6115702479338843
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6131687242798354
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6352459016393442
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6326530612244898
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6300813008130082
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6275303643724697
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.625
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6224899598393574
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.62
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -31.3    |
| Iteration     | 8        |
| MaximumReturn | -0.181   |
| MinimumReturn | -81.3    |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41830989718437195
Validation loss = 0.39952677488327026
Validation loss = 0.4007893204689026
Validation loss = 0.39850354194641113
Validation loss = 0.40508735179901123
Validation loss = 0.4059772491455078
Validation loss = 0.412435919046402
Validation loss = 0.4068676233291626
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4051799774169922
Validation loss = 0.403633177280426
Validation loss = 0.40628254413604736
Validation loss = 0.4136204123497009
Validation loss = 0.41357511281967163
Validation loss = 0.40632355213165283
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.41460707783699036
Validation loss = 0.41649553179740906
Validation loss = 0.41355717182159424
Validation loss = 0.4096970558166504
Validation loss = 0.4200863242149353
Validation loss = 0.42132797837257385
Validation loss = 0.4171983599662781
Validation loss = 0.4185013771057129
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.41663989424705505
Validation loss = 0.4026244282722473
Validation loss = 0.40346208214759827
Validation loss = 0.4057731032371521
Validation loss = 0.41343846917152405
Validation loss = 0.40614116191864014
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.41626065969467163
Validation loss = 0.4030231237411499
Validation loss = 0.40070080757141113
Validation loss = 0.4084494411945343
Validation loss = 0.40815091133117676
Validation loss = 0.41122889518737793
Validation loss = 0.4061877131462097
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6175298804780877
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6150793650793651
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6126482213438735
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.610236220472441
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6078431372549019
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.60546875
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.603112840466926
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6007751937984496
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5984555984555985
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5961538461538461
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5977011494252874
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5954198473282443
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5931558935361216
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5909090909090909
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5886792452830188
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5864661654135338
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5842696629213483
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.582089552238806
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5799256505576208
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5777777777777777
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5756457564575646
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5735294117647058
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5714285714285714
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5693430656934306
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5672727272727273
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.23    |
| Iteration     | 9        |
| MaximumReturn | -0.066   |
| MinimumReturn | -43.1    |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40746456384658813
Validation loss = 0.40476617217063904
Validation loss = 0.4099756181240082
Validation loss = 0.4072667360305786
Validation loss = 0.4047154486179352
Validation loss = 0.40348154306411743
Validation loss = 0.41822588443756104
Validation loss = 0.4122362434864044
Validation loss = 0.41289177536964417
Validation loss = 0.4178541898727417
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4032515287399292
Validation loss = 0.4073503911495209
Validation loss = 0.4021073579788208
Validation loss = 0.407762736082077
Validation loss = 0.39916568994522095
Validation loss = 0.41628679633140564
Validation loss = 0.4081425368785858
Validation loss = 0.4107450246810913
Validation loss = 0.41249263286590576
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.41906362771987915
Validation loss = 0.4164213538169861
Validation loss = 0.41530367732048035
Validation loss = 0.4192386865615845
Validation loss = 0.4188283681869507
Validation loss = 0.41624438762664795
Validation loss = 0.41991209983825684
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4019196629524231
Validation loss = 0.40709322690963745
Validation loss = 0.4039943814277649
Validation loss = 0.4093453288078308
Validation loss = 0.4125971794128418
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4191727638244629
Validation loss = 0.40176233649253845
Validation loss = 0.4043009281158447
Validation loss = 0.4093969762325287
Validation loss = 0.41022491455078125
Validation loss = 0.41372978687286377
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5652173913043478
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5631768953068592
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5611510791366906
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5591397849462365
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5571428571428572
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5551601423487544
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5531914893617021
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5512367491166078
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5492957746478874
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5473684210526316
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.548951048951049
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5470383275261324
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5451388888888888
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5432525951557093
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5413793103448276
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5395189003436426
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5376712328767124
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5358361774744027
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5340136054421769
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5322033898305085
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5304054054054054
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5286195286195287
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5268456375838926
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5250836120401338
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5233333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.831   |
| Iteration     | 10       |
| MaximumReturn | -0.0416  |
| MinimumReturn | -7.59    |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4177495837211609
Validation loss = 0.41115841269493103
Validation loss = 0.4166695177555084
Validation loss = 0.4152260422706604
Validation loss = 0.41943255066871643
Validation loss = 0.4231530725955963
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4120849668979645
Validation loss = 0.41014447808265686
Validation loss = 0.4062984585762024
Validation loss = 0.4193100929260254
Validation loss = 0.41876739263534546
Validation loss = 0.4135766923427582
Validation loss = 0.41721686720848083
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.42572030425071716
Validation loss = 0.43242502212524414
Validation loss = 0.42272013425827026
Validation loss = 0.4216993451118469
Validation loss = 0.4218093454837799
Validation loss = 0.427692174911499
Validation loss = 0.42209258675575256
Validation loss = 0.4271635115146637
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4021165370941162
Validation loss = 0.4016891121864319
Validation loss = 0.40391749143600464
Validation loss = 0.40733498334884644
Validation loss = 0.41017475724220276
Validation loss = 0.410134494304657
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4148171544075012
Validation loss = 0.41234683990478516
Validation loss = 0.4119115471839905
Validation loss = 0.4145887792110443
Validation loss = 0.4099400043487549
Validation loss = 0.41542762517929077
Validation loss = 0.4174385964870453
Validation loss = 0.42093324661254883
Validation loss = 0.4180857539176941
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.521594684385382
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5198675496688742
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5181518151815182
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5164473684210527
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5147540983606558
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5130718954248366
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5146579804560261
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5162337662337663
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5145631067961165
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5129032258064516
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5112540192926045
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5096153846153846
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5079872204472844
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5063694267515924
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5047619047619047
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5031645569620253
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.501577287066246
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49843260188087773
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.496875
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4953271028037383
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4937888198757764
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49226006191950467
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49074074074074076
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.49538461538461537
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.13    |
| Iteration     | 11       |
| MaximumReturn | -0.0434  |
| MinimumReturn | -55.8    |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41527342796325684
Validation loss = 0.4229082465171814
Validation loss = 0.4229992926120758
Validation loss = 0.4197627604007721
Validation loss = 0.41666775941848755
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42218804359436035
Validation loss = 0.4144861102104187
Validation loss = 0.41290226578712463
Validation loss = 0.4187794625759125
Validation loss = 0.42011961340904236
Validation loss = 0.4210616946220398
Validation loss = 0.4216544032096863
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4310566782951355
Validation loss = 0.4295540452003479
Validation loss = 0.42518526315689087
Validation loss = 0.43142253160476685
Validation loss = 0.4284408688545227
Validation loss = 0.433878093957901
Validation loss = 0.43176761269569397
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4110051691532135
Validation loss = 0.4136522710323334
Validation loss = 0.4082523286342621
Validation loss = 0.41663044691085815
Validation loss = 0.41360044479370117
Validation loss = 0.4218735694885254
Validation loss = 0.421210914850235
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.42490047216415405
Validation loss = 0.41929227113723755
Validation loss = 0.4221353530883789
Validation loss = 0.42367759346961975
Validation loss = 0.42458072304725647
Validation loss = 0.4237431585788727
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.49693251533742333
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5045871559633027
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5060975609756098
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5075987841945289
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.509090909090909
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5105740181268882
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5090361445783133
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5075075075075075
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5059880239520959
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5074626865671642
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5059523809523809
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5074183976261127
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5088757396449705
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5132743362831859
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5117647058823529
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5102639296187683
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5087719298245614
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5102040816326531
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5087209302325582
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5101449275362319
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5086705202312138
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5072046109510087
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5086206896551724
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5071633237822349
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5057142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.5    |
| Iteration     | 12       |
| MaximumReturn | -0.0486  |
| MinimumReturn | -72.5    |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4133903384208679
Validation loss = 0.417978972196579
Validation loss = 0.42002028226852417
Validation loss = 0.416899174451828
Validation loss = 0.424540638923645
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41294991970062256
Validation loss = 0.4113459885120392
Validation loss = 0.4111284911632538
Validation loss = 0.413959801197052
Validation loss = 0.42487525939941406
Validation loss = 0.4244667887687683
Validation loss = 0.41887184977531433
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.42272332310676575
Validation loss = 0.42411917448043823
Validation loss = 0.42113035917282104
Validation loss = 0.4276473820209503
Validation loss = 0.43158066272735596
Validation loss = 0.43159258365631104
Validation loss = 0.4273008406162262
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4084918797016144
Validation loss = 0.40947654843330383
Validation loss = 0.41891375184059143
Validation loss = 0.4129434823989868
Validation loss = 0.41387560963630676
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4220792055130005
Validation loss = 0.4200373589992523
Validation loss = 0.42349183559417725
Validation loss = 0.41730785369873047
Validation loss = 0.4226122796535492
Validation loss = 0.42552801966667175
Validation loss = 0.42912712693214417
Validation loss = 0.43233126401901245
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5042735042735043
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5028409090909091
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5014164305949008
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49859154929577465
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49719101123595505
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5014005602240896
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4986072423398329
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49722222222222223
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49584487534626037
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.494475138121547
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4931129476584022
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4945054945054945
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4931506849315068
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4918032786885246
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.49318801089918257
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4945652173913043
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4932249322493225
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4918918918918919
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49056603773584906
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.489247311827957
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4879356568364611
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48663101604278075
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48533333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.39    |
| Iteration     | 13       |
| MaximumReturn | -0.0238  |
| MinimumReturn | -59.3    |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4129268229007721
Validation loss = 0.414590448141098
Validation loss = 0.42270079255104065
Validation loss = 0.42033305764198303
Validation loss = 0.4286140501499176
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41883718967437744
Validation loss = 0.4142260253429413
Validation loss = 0.4164522588253021
Validation loss = 0.42292264103889465
Validation loss = 0.42587170004844666
Validation loss = 0.4179963171482086
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.43189260363578796
Validation loss = 0.43261662125587463
Validation loss = 0.42484983801841736
Validation loss = 0.436307430267334
Validation loss = 0.429313063621521
Validation loss = 0.43208837509155273
Validation loss = 0.43771615624427795
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4128073453903198
Validation loss = 0.4178071916103363
Validation loss = 0.4135688245296478
Validation loss = 0.4201827943325043
Validation loss = 0.4154082238674164
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4238971471786499
Validation loss = 0.4209628403186798
Validation loss = 0.42165234684944153
Validation loss = 0.4262031316757202
Validation loss = 0.4315743148326874
Validation loss = 0.4299544095993042
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48404255319148937
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4827586206896552
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48412698412698413
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48548812664907653
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4842105263157895
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4881889763779528
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4895287958115183
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4908616187989556
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4895833333333333
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4883116883116883
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4896373056994819
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4883720930232558
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48711340206185566
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48586118251928023
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4846153846153846
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4859335038363171
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4846938775510204
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48346055979643765
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48223350253807107
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4810126582278481
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.48737373737373735
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48614609571788414
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48743718592964824
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48872180451127817
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4875
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.66    |
| Iteration     | 14       |
| MaximumReturn | -0.0609  |
| MinimumReturn | -44.7    |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4168359935283661
Validation loss = 0.4153871536254883
Validation loss = 0.42080238461494446
Validation loss = 0.4239153265953064
Validation loss = 0.42761465907096863
Validation loss = 0.42350688576698303
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41907480359077454
Validation loss = 0.42415687441825867
Validation loss = 0.41492384672164917
Validation loss = 0.4181695282459259
Validation loss = 0.4185121953487396
Validation loss = 0.42841893434524536
Validation loss = 0.42684677243232727
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4282119870185852
Validation loss = 0.4350225627422333
Validation loss = 0.43046677112579346
Validation loss = 0.430231511592865
Validation loss = 0.4331098794937134
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4144071340560913
Validation loss = 0.4124675989151001
Validation loss = 0.41194528341293335
Validation loss = 0.4140089750289917
Validation loss = 0.41961318254470825
Validation loss = 0.41463395953178406
Validation loss = 0.4179009199142456
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4212679862976074
Validation loss = 0.429876446723938
Validation loss = 0.42518508434295654
Validation loss = 0.4222308397293091
Validation loss = 0.42867761850357056
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.486284289276808
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48756218905472637
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4913151364764268
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4900990099009901
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4888888888888889
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4876847290640394
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4864864864864865
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4852941176470588
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4841075794621027
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48292682926829267
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48175182481751827
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48058252427184467
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4794188861985472
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4806763285024155
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4795180722891566
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47836538461538464
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47721822541966424
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47607655502392343
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47494033412887826
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4738095238095238
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4750593824228028
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47393364928909953
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4728132387706856
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4716981132075472
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47058823529411764
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.31    |
| Iteration     | 15       |
| MaximumReturn | -0.0468  |
| MinimumReturn | -86      |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.42011839151382446
Validation loss = 0.418614000082016
Validation loss = 0.41722530126571655
Validation loss = 0.4268666207790375
Validation loss = 0.42454957962036133
Validation loss = 0.4277779757976532
Validation loss = 0.43113210797309875
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.417373925447464
Validation loss = 0.4165010154247284
Validation loss = 0.41830259561538696
Validation loss = 0.42521077394485474
Validation loss = 0.42342624068260193
Validation loss = 0.4186493456363678
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.42538532614707947
Validation loss = 0.43048548698425293
Validation loss = 0.428467333316803
Validation loss = 0.436336487531662
Validation loss = 0.43703946471214294
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.409207820892334
Validation loss = 0.412813276052475
Validation loss = 0.4181831479072571
Validation loss = 0.41601690649986267
Validation loss = 0.4204474091529846
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4236152768135071
Validation loss = 0.4336056113243103
Validation loss = 0.41685304045677185
Validation loss = 0.41820594668388367
Validation loss = 0.4248492121696472
Validation loss = 0.42542919516563416
Validation loss = 0.42494791746139526
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.47183098591549294
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4707259953161593
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4719626168224299
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47086247086247085
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4697674418604651
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.46867749419953597
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4722222222222222
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.47113163972286376
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.47465437788018433
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4735632183908046
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.47706422018348627
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4759725400457666
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4817351598173516
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.48519362186788156
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4909090909090909
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4897959183673469
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48868778280542985
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.49209932279909707
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49099099099099097
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4898876404494382
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48878923766816146
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48769574944071586
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48660714285714285
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48775055679287305
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4911111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -15.3    |
| Iteration     | 16       |
| MaximumReturn | -0.0616  |
| MinimumReturn | -57.5    |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4209310710430145
Validation loss = 0.42067238688468933
Validation loss = 0.4167419373989105
Validation loss = 0.42128148674964905
Validation loss = 0.4289344549179077
Validation loss = 0.4282798767089844
Validation loss = 0.43276458978652954
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4191974103450775
Validation loss = 0.4227754771709442
Validation loss = 0.417447566986084
Validation loss = 0.4180721938610077
Validation loss = 0.41946929693222046
Validation loss = 0.4174898862838745
Validation loss = 0.4158048927783966
Validation loss = 0.41925933957099915
Validation loss = 0.4246312975883484
Validation loss = 0.4167934060096741
Validation loss = 0.4253866970539093
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4229249358177185
Validation loss = 0.42578306794166565
Validation loss = 0.43177172541618347
Validation loss = 0.4275149703025818
Validation loss = 0.42934450507164
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.41035181283950806
Validation loss = 0.41158732771873474
Validation loss = 0.4150826036930084
Validation loss = 0.4147379994392395
Validation loss = 0.41098397970199585
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4302522838115692
Validation loss = 0.42069438099861145
Validation loss = 0.4266619384288788
Validation loss = 0.4277372360229492
Validation loss = 0.42611292004585266
Validation loss = 0.42624932527542114
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.49889135254988914
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.497787610619469
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4988962472406181
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5022026431718062
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5010989010989011
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4989059080962801
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4989106753812636
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49891540130151846
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5010799136069114
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5064655172413793
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5053763440860215
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5064377682403434
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5053533190578159
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5042735042735043
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5031982942430704
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.502127659574468
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5010615711252654
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5021186440677966
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5010570824524313
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5021097046413502
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5010526315789474
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.8    |
| Iteration     | 17       |
| MaximumReturn | -0.174   |
| MinimumReturn | -132     |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.42653608322143555
Validation loss = 0.4251208007335663
Validation loss = 0.42775118350982666
Validation loss = 0.4307522475719452
Validation loss = 0.4257151782512665
Validation loss = 0.4306114912033081
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4218813180923462
Validation loss = 0.42517101764678955
Validation loss = 0.4253373444080353
Validation loss = 0.4294416308403015
Validation loss = 0.4293447434902191
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.41865506768226624
Validation loss = 0.4244686961174011
Validation loss = 0.4251979887485504
Validation loss = 0.4267059564590454
Validation loss = 0.4283607602119446
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4103112816810608
Validation loss = 0.4106055200099945
Validation loss = 0.41358667612075806
Validation loss = 0.413749098777771
Validation loss = 0.41913464665412903
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4252629578113556
Validation loss = 0.4243408441543579
Validation loss = 0.42340365052223206
Validation loss = 0.42454323172569275
Validation loss = 0.42804116010665894
Validation loss = 0.4243161380290985
Validation loss = 0.429280549287796
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5031446540880503
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.502092050209205
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5010438413361169
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5031185031185031
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5020746887966805
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5010351966873706
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5020661157024794
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5010309278350515
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4989733059548255
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4979508196721312
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49693251533742333
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4959183673469388
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49490835030549896
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.49796747967479676
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4969574036511156
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4959514170040486
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.494949494949495
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5010060362173038
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.503006012024048
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.502
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.3    |
| Iteration     | 18       |
| MaximumReturn | -0.125   |
| MinimumReturn | -73      |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.43305855989456177
Validation loss = 0.4272023141384125
Validation loss = 0.4298814535140991
Validation loss = 0.4324778914451599
Validation loss = 0.4346614480018616
Validation loss = 0.44314688444137573
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4231063723564148
Validation loss = 0.4252660870552063
Validation loss = 0.42859727144241333
Validation loss = 0.4313774108886719
Validation loss = 0.42555153369903564
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4274986684322357
Validation loss = 0.4296913146972656
Validation loss = 0.4349706172943115
Validation loss = 0.4322119951248169
Validation loss = 0.4388105571269989
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.41670018434524536
Validation loss = 0.41260814666748047
Validation loss = 0.42027702927589417
Validation loss = 0.42134618759155273
Validation loss = 0.42644202709198
Validation loss = 0.4228249490261078
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.42557308077812195
Validation loss = 0.4257414937019348
Validation loss = 0.4347425103187561
Validation loss = 0.4351627826690674
Validation loss = 0.43168219923973083
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.500998003992016
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4990059642147117
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.498015873015873
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.497029702970297
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4980237154150198
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4970414201183432
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49606299212598426
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4950884086444008
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49411764705882355
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4931506849315068
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4921875
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49122807017543857
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.490272373540856
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4893203883495146
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4883720930232558
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4874274661508704
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4864864864864865
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48554913294797686
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.49038461538461536
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4894433781190019
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4885057471264368
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4875717017208413
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4866412213740458
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4857142857142857
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.7    |
| Iteration     | 19       |
| MaximumReturn | -0.0643  |
| MinimumReturn | -85.8    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.42927026748657227
Validation loss = 0.4360128343105316
Validation loss = 0.4343821704387665
Validation loss = 0.43382301926612854
Validation loss = 0.43723660707473755
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42332127690315247
Validation loss = 0.42612552642822266
Validation loss = 0.4288656711578369
Validation loss = 0.43401890993118286
Validation loss = 0.4293731451034546
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4294135570526123
Validation loss = 0.42771467566490173
Validation loss = 0.4313429892063141
Validation loss = 0.4366770386695862
Validation loss = 0.4344213008880615
Validation loss = 0.43610134720802307
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42033839225769043
Validation loss = 0.4199240505695343
Validation loss = 0.41982656717300415
Validation loss = 0.42063310742378235
Validation loss = 0.4362688958644867
Validation loss = 0.4236956238746643
Validation loss = 0.43053796887397766
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.42838427424430847
Validation loss = 0.43323248624801636
Validation loss = 0.4338659942150116
Validation loss = 0.4355308711528778
Validation loss = 0.4356513023376465
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4847908745247148
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4838709677419355
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48295454545454547
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4820415879017013
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.4867924528301887
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4896421845574388
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.4924812030075188
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4915572232645403
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.49250936329588013
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.497196261682243
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4962686567164179
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49534450651769085
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4944237918215613
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.49907235621521334
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5018518518518519
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5027726432532348
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.503690036900369
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5027624309392266
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5018382352941176
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5027522935779817
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5018315018315018
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5045703839122486
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5036496350364964
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5027322404371585
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5036363636363637
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27.6    |
| Iteration     | 20       |
| MaximumReturn | -0.0395  |
| MinimumReturn | -91.9    |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.44137558341026306
Validation loss = 0.43761587142944336
Validation loss = 0.4415811002254486
Validation loss = 0.43808385729789734
Validation loss = 0.44157955050468445
Validation loss = 0.44370827078819275
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42818397283554077
Validation loss = 0.42523276805877686
Validation loss = 0.42739376425743103
Validation loss = 0.4398649334907532
Validation loss = 0.4314836859703064
Validation loss = 0.43360090255737305
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4307178258895874
Validation loss = 0.43311935663223267
Validation loss = 0.4363277554512024
Validation loss = 0.43603673577308655
Validation loss = 0.4418092966079712
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42070940136909485
Validation loss = 0.42368510365486145
Validation loss = 0.42778241634368896
Validation loss = 0.4323503077030182
Validation loss = 0.431378573179245
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.42772725224494934
Validation loss = 0.435796856880188
Validation loss = 0.4309089779853821
Validation loss = 0.4335154891014099
Validation loss = 0.4342867136001587
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5027223230490018
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5054347826086957
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5063291139240507
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5054151624548736
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5045045045045045
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5035971223021583
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5026929982046678
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5017921146953405
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5026833631484794
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5053571428571428
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5044563279857398
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.50355871886121
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5026642984014209
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.50177304964539
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5008849557522124
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4991181657848324
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4982394366197183
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4991212653778559
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4982456140350877
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4973730297723292
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4965034965034965
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4956369982547993
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49477351916376305
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49391304347826087
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.7    |
| Iteration     | 21       |
| MaximumReturn | -0.103   |
| MinimumReturn | -85.6    |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.44218504428863525
Validation loss = 0.4439733922481537
Validation loss = 0.44144555926322937
Validation loss = 0.4438098967075348
Validation loss = 0.4439798593521118
Validation loss = 0.44503989815711975
Validation loss = 0.4484642446041107
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4359447658061981
Validation loss = 0.4327436685562134
Validation loss = 0.43410053849220276
Validation loss = 0.43756139278411865
Validation loss = 0.43549075722694397
Validation loss = 0.4398758113384247
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.43931815028190613
Validation loss = 0.4367905557155609
Validation loss = 0.44144681096076965
Validation loss = 0.4409872889518738
Validation loss = 0.44122856855392456
Validation loss = 0.44741109013557434
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4238031208515167
Validation loss = 0.4322936534881592
Validation loss = 0.432877779006958
Validation loss = 0.4306146800518036
Validation loss = 0.43427774310112
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4342208206653595
Validation loss = 0.43610623478889465
Validation loss = 0.44003236293792725
Validation loss = 0.4420888423919678
Validation loss = 0.4354572296142578
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4930555555555556
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49220103986135183
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4913494809688581
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4905008635578584
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4896551724137931
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4905335628227194
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4896907216494845
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4888507718696398
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.488013698630137
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.48717948717948717
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4880546075085324
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.4889267461669506
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4880952380952381
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4872665534804754
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4864406779661017
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4856175972927242
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4847972972972973
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4839797639123103
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.48653198653198654
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.48739495798319327
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.4949664429530201
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49413735343383586
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.49665551839464883
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.4958263772954925
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.495
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27      |
| Iteration     | 22       |
| MaximumReturn | -0.151   |
| MinimumReturn | -107     |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4436776041984558
Validation loss = 0.4433199465274811
Validation loss = 0.4409945607185364
Validation loss = 0.44915252923965454
Validation loss = 0.4499853551387787
Validation loss = 0.4560892581939697
Validation loss = 0.4516317844390869
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4373950958251953
Validation loss = 0.43336158990859985
Validation loss = 0.438228040933609
Validation loss = 0.4374516010284424
Validation loss = 0.4373281002044678
Validation loss = 0.45028796792030334
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4398462772369385
Validation loss = 0.43921416997909546
Validation loss = 0.4438599944114685
Validation loss = 0.4417402744293213
Validation loss = 0.4447297155857086
Validation loss = 0.44582515954971313
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.44056597352027893
Validation loss = 0.42827993631362915
Validation loss = 0.4315732419490814
Validation loss = 0.4366362988948822
Validation loss = 0.4420646131038666
Validation loss = 0.4417828917503357
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4338395595550537
Validation loss = 0.43619227409362793
Validation loss = 0.43349742889404297
Validation loss = 0.436168909072876
Validation loss = 0.44009310007095337
Validation loss = 0.43893152475357056
Validation loss = 0.4468749463558197
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49417637271214643
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.49335548172757476
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.49585406301824214
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.49834437086092714
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5074380165289256
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5066006600660066
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5090609555189456
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5098684210526315
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5090311986863711
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5114754098360655
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5106382978723404
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5098039215686274
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5089722675367048
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5130293159609121
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5154471544715448
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5146103896103896
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.513776337115073
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5129449838187702
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5137318255250404
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5193548387096775
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5185185185185185
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5176848874598071
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5168539325842697
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5160256410256411
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5168
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23.4    |
| Iteration     | 23       |
| MaximumReturn | -0.209   |
| MinimumReturn | -96.2    |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.45027676224708557
Validation loss = 0.4550565183162689
Validation loss = 0.4502270221710205
Validation loss = 0.4498476982116699
Validation loss = 0.4586154520511627
Validation loss = 0.45909953117370605
Validation loss = 0.46222835779190063
Validation loss = 0.46125364303588867
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4381473958492279
Validation loss = 0.44101768732070923
Validation loss = 0.43797627091407776
Validation loss = 0.44043201208114624
Validation loss = 0.443142831325531
Validation loss = 0.45058196783065796
Validation loss = 0.44741836190223694
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4402746260166168
Validation loss = 0.4429206848144531
Validation loss = 0.44560742378234863
Validation loss = 0.44976964592933655
Validation loss = 0.4504240155220032
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.43883413076400757
Validation loss = 0.4368976056575775
Validation loss = 0.44082051515579224
Validation loss = 0.4483293890953064
Validation loss = 0.4371175765991211
Validation loss = 0.4402402937412262
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4414982199668884
Validation loss = 0.45403581857681274
Validation loss = 0.4407389163970947
Validation loss = 0.4439593255519867
Validation loss = 0.4455853998661041
Validation loss = 0.4457154870033264
Validation loss = 0.4536570906639099
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5159744408945687
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5151515151515151
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5159235668789809
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5166931637519873
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5206349206349207
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5229793977812995
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5316455696202531
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5355450236966824
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5362776025236593
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5354330708661418
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5408805031446541
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5416012558869702
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5407523510971787
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5446009389671361
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.54375
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5475819032761311
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5514018691588785
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5552099533437014
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5559006211180124
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5550387596899224
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5572755417956656
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5610510046367851
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5617283950617284
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5608628659476117
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.56
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -44.8    |
| Iteration     | 24       |
| MaximumReturn | -0.139   |
| MinimumReturn | -110     |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4520483613014221
Validation loss = 0.45117291808128357
Validation loss = 0.45880943536758423
Validation loss = 0.4558739960193634
Validation loss = 0.46271395683288574
Validation loss = 0.4603610336780548
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4385092258453369
Validation loss = 0.4406837224960327
Validation loss = 0.4424721300601959
Validation loss = 0.4433538317680359
Validation loss = 0.4412950873374939
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4474077820777893
Validation loss = 0.4447658956050873
Validation loss = 0.4469606280326843
Validation loss = 0.44741639494895935
Validation loss = 0.4545647203922272
Validation loss = 0.465034157037735
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.439864844083786
Validation loss = 0.43824100494384766
Validation loss = 0.4373384118080139
Validation loss = 0.43722090125083923
Validation loss = 0.44139325618743896
Validation loss = 0.44066739082336426
Validation loss = 0.44822391867637634
Validation loss = 0.44946756958961487
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4440215528011322
Validation loss = 0.4425261914730072
Validation loss = 0.44978851079940796
Validation loss = 0.45022398233413696
Validation loss = 0.4486313760280609
Validation loss = 0.4521288275718689
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5591397849462365
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.558282208588957
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.55895865237366
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5581039755351682
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.5694656488549619
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5685975609756098
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.5722983257229832
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5714285714285714
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5705614567526556
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5727272727272728
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.573373676248109
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.5800604229607251
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.5822021116138764
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.5903614457831325
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5909774436090226
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5915915915915916
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.5922038980509745
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.594311377245509
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.593423019431988
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5925373134328358
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.5976154992548435
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6011904761904762
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.600297176820208
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.599406528189911
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5985185185185186
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27.3    |
| Iteration     | 25       |
| MaximumReturn | -0.0779  |
| MinimumReturn | -89.8    |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4596901834011078
Validation loss = 0.46056297421455383
Validation loss = 0.4565638303756714
Validation loss = 0.46056410670280457
Validation loss = 0.45810654759407043
Validation loss = 0.4636780619621277
Validation loss = 0.46398621797561646
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4393838047981262
Validation loss = 0.444388210773468
Validation loss = 0.4404016435146332
Validation loss = 0.44272544980049133
Validation loss = 0.4444698989391327
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.44844844937324524
Validation loss = 0.4504864811897278
Validation loss = 0.44879406690597534
Validation loss = 0.4553898870944977
Validation loss = 0.44817325472831726
Validation loss = 0.452976793050766
Validation loss = 0.45873212814331055
Validation loss = 0.45220041275024414
Validation loss = 0.45830461382865906
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4457434415817261
Validation loss = 0.444948673248291
Validation loss = 0.44457319378852844
Validation loss = 0.45035994052886963
Validation loss = 0.4496496915817261
Validation loss = 0.4507291913032532
Validation loss = 0.4552967846393585
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.44769421219825745
Validation loss = 0.44378578662872314
Validation loss = 0.44341757893562317
Validation loss = 0.4491753578186035
Validation loss = 0.4523167610168457
Validation loss = 0.45291897654533386
Validation loss = 0.4504702389240265
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.5976331360946746
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6011816838995568
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6076696165191741
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6067746686303387
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.611764705882353
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6123348017621145
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6114369501466276
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6120058565153733
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6111111111111112
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6102189781021898
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.60932944606414
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6084425036390102
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6090116279069767
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6081277213352685
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6072463768115942
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6063675832127352
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6054913294797688
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6118326118326118
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6152737752161384
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6143884892086331
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6149425287356322
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6140602582496413
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6160458452722063
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6208869814020028
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.62
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.9    |
| Iteration     | 26       |
| MaximumReturn | -0.0702  |
| MinimumReturn | -93.3    |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.46331658959388733
Validation loss = 0.4621981382369995
Validation loss = 0.46253105998039246
Validation loss = 0.47276726365089417
Validation loss = 0.46433597803115845
Validation loss = 0.46352124214172363
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4406837522983551
Validation loss = 0.4439973533153534
Validation loss = 0.4441375732421875
Validation loss = 0.4471712112426758
Validation loss = 0.4471711218357086
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4568418562412262
Validation loss = 0.45531293749809265
Validation loss = 0.459575891494751
Validation loss = 0.4617154002189636
Validation loss = 0.4599759876728058
Validation loss = 0.46339163184165955
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.45310255885124207
Validation loss = 0.4491818845272064
Validation loss = 0.45434749126434326
Validation loss = 0.4533098340034485
Validation loss = 0.45147904753685
Validation loss = 0.45227593183517456
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.45017415285110474
Validation loss = 0.4507429003715515
Validation loss = 0.45396888256073
Validation loss = 0.4552774727344513
Validation loss = 0.45601993799209595
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6191155492154066
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6182336182336182
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6173541963015647
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6164772727272727
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.624113475177305
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.623229461756374
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6223479490806223
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6214689265536724
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6220028208744711
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6211267605633802
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.620253164556962
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6193820224719101
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6213183730715287
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6218487394957983
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6237762237762238
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6256983240223464
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6290097629009763
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.628133704735376
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.627260083449235
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6263888888888889
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6255201109570042
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6246537396121884
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.623789764868603
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6229281767955801
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.623448275862069
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.1    |
| Iteration     | 27       |
| MaximumReturn | -0.025   |
| MinimumReturn | -98.5    |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4645858705043793
Validation loss = 0.46917280554771423
Validation loss = 0.4668422043323517
Validation loss = 0.47156253457069397
Validation loss = 0.47032299637794495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.44414958357810974
Validation loss = 0.4474882185459137
Validation loss = 0.44820475578308105
Validation loss = 0.44951844215393066
Validation loss = 0.44745683670043945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.46357035636901855
Validation loss = 0.45362749695777893
Validation loss = 0.46061840653419495
Validation loss = 0.46292415261268616
Validation loss = 0.46630987524986267
Validation loss = 0.46843981742858887
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46070417761802673
Validation loss = 0.45353999733924866
Validation loss = 0.45007529854774475
Validation loss = 0.4535812437534332
Validation loss = 0.4599173069000244
Validation loss = 0.4568904638290405
Validation loss = 0.45994237065315247
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.45363929867744446
Validation loss = 0.45658406615257263
Validation loss = 0.4554988443851471
Validation loss = 0.45756927132606506
Validation loss = 0.46367117762565613
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6225895316804407
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6217331499312242
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6208791208791209
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6268861454046639
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.626027397260274
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6292749658002736
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6284153005464481
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6275579809004093
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6335149863760218
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.638095238095238
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6372282608695652
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6377204884667571
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6436314363143631
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.652232746955345
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6554054054054054
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6626180836707153
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6617250673854448
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.6689098250336474
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.6693548387096774
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.6751677852348993
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6796246648793566
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6813922356091031
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.68048128342246
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6795727636849133
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6786666666666666
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -15.6    |
| Iteration     | 28       |
| MaximumReturn | -0.0716  |
| MinimumReturn | -74.9    |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4707815647125244
Validation loss = 0.46630868315696716
Validation loss = 0.46941128373146057
Validation loss = 0.4667729139328003
Validation loss = 0.4711720645427704
Validation loss = 0.46883705258369446
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.44697943329811096
Validation loss = 0.448568731546402
Validation loss = 0.44957420229911804
Validation loss = 0.4500206708908081
Validation loss = 0.4551713168621063
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4597039818763733
Validation loss = 0.4610200822353363
Validation loss = 0.46383485198020935
Validation loss = 0.46630942821502686
Validation loss = 0.4667768180370331
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4572523534297943
Validation loss = 0.45823973417282104
Validation loss = 0.45921677350997925
Validation loss = 0.4589557349681854
Validation loss = 0.46312248706817627
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4587315320968628
Validation loss = 0.45671895146369934
Validation loss = 0.4620003402233124
Validation loss = 0.45879635214805603
Validation loss = 0.4618690013885498
Validation loss = 0.4629936218261719
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6830892143808256
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6861702127659575
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6879150066401063
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6870026525198939
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.686092715231788
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6851851851851852
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6869220607661823
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6860158311345647
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6903820816864296
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.6947368421052632
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6938239159001314
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.6955380577427821
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.6985583224115334
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6976439790575916
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6967320261437908
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.695822454308094
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6949152542372882
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.69921875
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.6983094928478544
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7038961038961039
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7042801556420234
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7033678756476683
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.703751617076326
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7118863049095607
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.712258064516129
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.27    |
| Iteration     | 29       |
| MaximumReturn | -0.151   |
| MinimumReturn | -122     |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.47061553597450256
Validation loss = 0.4750674366950989
Validation loss = 0.46756088733673096
Validation loss = 0.47228720784187317
Validation loss = 0.4738420844078064
Validation loss = 0.47704216837882996
Validation loss = 0.4754278063774109
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4462375342845917
Validation loss = 0.4516928493976593
Validation loss = 0.453399658203125
Validation loss = 0.4543820321559906
Validation loss = 0.45464175939559937
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.467818021774292
Validation loss = 0.4631936252117157
Validation loss = 0.4629382789134979
Validation loss = 0.4750319719314575
Validation loss = 0.4674428701400757
Validation loss = 0.4678053557872772
Validation loss = 0.474518358707428
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46122828125953674
Validation loss = 0.45780226588249207
Validation loss = 0.4649128317832947
Validation loss = 0.4672257602214813
Validation loss = 0.46535658836364746
Validation loss = 0.46887123584747314
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.46281692385673523
Validation loss = 0.4601235091686249
Validation loss = 0.46291327476501465
Validation loss = 0.4641267657279968
Validation loss = 0.466067910194397
Validation loss = 0.4696769714355469
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7152061855670103
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7181467181467182
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7172236503856041
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7175866495507061
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7166666666666667
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7208706786171575
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7199488491048593
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7279693486590039
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7270408163265306
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7261146496815286
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7251908396946565
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7242693773824651
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7271573604060914
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7262357414448669
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7253164556962025
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.729456384323641
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7323232323232324
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7313997477931904
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7304785894206549
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7333333333333333
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7349246231155779
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7340025094102886
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7330827067669173
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7359198998748435
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.73625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.2    |
| Iteration     | 30       |
| MaximumReturn | -0.0664  |
| MinimumReturn | -83.7    |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4750954508781433
Validation loss = 0.47564342617988586
Validation loss = 0.4779459536075592
Validation loss = 0.4774719476699829
Validation loss = 0.4841059446334839
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4570484161376953
Validation loss = 0.4624910354614258
Validation loss = 0.4528149366378784
Validation loss = 0.4550952911376953
Validation loss = 0.45790064334869385
Validation loss = 0.4604796767234802
Validation loss = 0.45979559421539307
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.468401700258255
Validation loss = 0.4707551598548889
Validation loss = 0.477968692779541
Validation loss = 0.4757689833641052
Validation loss = 0.4736456573009491
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46436119079589844
Validation loss = 0.46858829259872437
Validation loss = 0.46815598011016846
Validation loss = 0.4706805646419525
Validation loss = 0.46930184960365295
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.46633484959602356
Validation loss = 0.4666212499141693
Validation loss = 0.47095462679862976
Validation loss = 0.4700618088245392
Validation loss = 0.4767133891582489
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.735330836454432
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7344139650872819
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7334993773349938
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7350746268656716
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7341614906832298
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7332506203473945
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.734820322180917
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7388613861386139
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7379480840543882
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.745679012345679
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7447595561035758
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7438423645320197
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7429274292742928
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.742014742014742
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7411042944785277
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7401960784313726
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7392900856793145
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7420537897310513
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7411477411477412
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7426829268292683
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7429963459196103
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7420924574209246
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.74726609963548
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7463592233009708
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.7527272727272727
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.07    |
| Iteration     | 31       |
| MaximumReturn | -0.109   |
| MinimumReturn | -67.6    |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.47842133045196533
Validation loss = 0.4764469265937805
Validation loss = 0.4763317406177521
Validation loss = 0.4803615212440491
Validation loss = 0.4833250641822815
Validation loss = 0.4836263656616211
Validation loss = 0.4903407394886017
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46156665682792664
Validation loss = 0.46042177081108093
Validation loss = 0.46063610911369324
Validation loss = 0.46468624472618103
Validation loss = 0.46415671706199646
Validation loss = 0.46440714597702026
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4732699692249298
Validation loss = 0.4765988886356354
Validation loss = 0.48254597187042236
Validation loss = 0.4799673557281494
Validation loss = 0.48169204592704773
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46779781579971313
Validation loss = 0.46736666560173035
Validation loss = 0.4696717858314514
Validation loss = 0.47539106011390686
Validation loss = 0.4728832244873047
Validation loss = 0.4720137417316437
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.47395235300064087
Validation loss = 0.4713772237300873
Validation loss = 0.4707232713699341
Validation loss = 0.4706113636493683
Validation loss = 0.47494766116142273
Validation loss = 0.47744444012641907
Validation loss = 0.4796953797340393
Validation loss = 0.4795810282230377
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.7518159806295399
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.75453446191052
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7572463768115942
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.7623642943305187
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7662650602409639
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7677496991576414
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.7716346153846154
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.773109243697479
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.7745803357314148
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.7820359281437126
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.7906698564593302
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.7933094384707288
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.7935560859188544
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.799761620977354
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8011904761904762
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8049940546967895
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8064133016627079
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8125741399762753
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8127962085308057
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.8189349112426035
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8191489361702128
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.820543093270366
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8231132075471698
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.823321554770318
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8258823529411765
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -112     |
| Iteration     | 32       |
| MaximumReturn | -56.5    |
| MinimumReturn | -156     |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4816093146800995
Validation loss = 0.4831833243370056
Validation loss = 0.48364073038101196
Validation loss = 0.4821665287017822
Validation loss = 0.48989468812942505
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46277719736099243
Validation loss = 0.4608638286590576
Validation loss = 0.4690304100513458
Validation loss = 0.4648365378379822
Validation loss = 0.4680554270744324
Validation loss = 0.4753943383693695
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.47292637825012207
Validation loss = 0.47458919882774353
Validation loss = 0.4768894612789154
Validation loss = 0.4803793728351593
Validation loss = 0.477157860994339
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46748587489128113
Validation loss = 0.48170167207717896
Validation loss = 0.47324684262275696
Validation loss = 0.47544336318969727
Validation loss = 0.4782758355140686
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4752325117588043
Validation loss = 0.4785151779651642
Validation loss = 0.47792747616767883
Validation loss = 0.480995237827301
Validation loss = 0.48173585534095764
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8249118683901293
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8262910798122066
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8253223915592028
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8243559718969555
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8257309941520468
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8294392523364486
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8331388564760793
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8344988344988346
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8370197904540163
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.8441860465116279
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8455284552845529
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8468677494199536
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8493626882966396
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.8576388888888888
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8578034682080925
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8614318706697459
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8650519031141869
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8675115207373272
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8665132336018412
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8689655172413793
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8714121699196326
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8738532110091743
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8751431844215349
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8775743707093822
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8788571428571429
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -62.7    |
| Iteration     | 33       |
| MaximumReturn | -0.182   |
| MinimumReturn | -122     |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4886767268180847
Validation loss = 0.48104697465896606
Validation loss = 0.4822213649749756
Validation loss = 0.48978862166404724
Validation loss = 0.48937639594078064
Validation loss = 0.4885901212692261
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46867460012435913
Validation loss = 0.466960608959198
Validation loss = 0.4683249592781067
Validation loss = 0.4727656841278076
Validation loss = 0.473626047372818
Validation loss = 0.4703576862812042
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4768948256969452
Validation loss = 0.47742050886154175
Validation loss = 0.48252835869789124
Validation loss = 0.48273009061813354
Validation loss = 0.4839608669281006
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.47250333428382874
Validation loss = 0.4738464951515198
Validation loss = 0.47806403040885925
Validation loss = 0.47611355781555176
Validation loss = 0.48062342405319214
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.47452011704444885
Validation loss = 0.47713911533355713
Validation loss = 0.48258307576179504
Validation loss = 0.48098260164260864
Validation loss = 0.4867668151855469
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8789954337899544
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8779931584948689
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.876993166287016
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.875995449374289
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.875
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8740068104426788
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.8809523809523809
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8810872027180068
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8857466063348416
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8858757062146893
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8848758465011287
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8838782412626832
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8873873873873874
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8886389201349831
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8876404494382022
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8911335578002245
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8901345291479821
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8891377379619261
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8881431767337807
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8871508379888268
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8872767857142857
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.8918617614269788
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8953229398663697
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.896551724137931
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8955555555555555
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -25.4    |
| Iteration     | 34       |
| MaximumReturn | -0.155   |
| MinimumReturn | -103     |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4902070462703705
Validation loss = 0.48342132568359375
Validation loss = 0.4881897270679474
Validation loss = 0.4930330216884613
Validation loss = 0.49028560519218445
Validation loss = 0.4899527132511139
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4727310836315155
Validation loss = 0.46953830122947693
Validation loss = 0.46863606572151184
Validation loss = 0.4746156334877014
Validation loss = 0.47550538182258606
Validation loss = 0.47744664549827576
Validation loss = 0.4797789752483368
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48380646109580994
Validation loss = 0.4838849902153015
Validation loss = 0.48084428906440735
Validation loss = 0.4819949269294739
Validation loss = 0.4857211410999298
Validation loss = 0.4877442717552185
Validation loss = 0.48770636320114136
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4789254367351532
Validation loss = 0.47473087906837463
Validation loss = 0.4750818610191345
Validation loss = 0.48236533999443054
Validation loss = 0.48250481486320496
Validation loss = 0.4854932427406311
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4844895303249359
Validation loss = 0.4805696904659271
Validation loss = 0.48402732610702515
Validation loss = 0.4932294189929962
Validation loss = 0.4907898008823395
Validation loss = 0.4928811192512512
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8945615982241953
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8968957871396895
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8959025470653378
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8949115044247787
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8961325966850828
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8962472406181016
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8974641675854466
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9008810572687225
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8998899889988999
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9010989010989011
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9001097694840834
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8991228070175439
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8981380065717415
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8971553610503282
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8961748633879781
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8951965065502183
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8942202835332607
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8932461873638344
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.8955386289445049
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8956521739130435
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8946796959826275
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8937093275488069
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8927410617551462
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8917748917748918
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8908108108108108
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.32    |
| Iteration     | 35       |
| MaximumReturn | -0.094   |
| MinimumReturn | -13      |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49433109164237976
Validation loss = 0.4933560788631439
Validation loss = 0.4893229305744171
Validation loss = 0.49508553743362427
Validation loss = 0.4952954053878784
Validation loss = 0.4968320429325104
Validation loss = 0.49792709946632385
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.47776585817337036
Validation loss = 0.4758182764053345
Validation loss = 0.4767710566520691
Validation loss = 0.4821685254573822
Validation loss = 0.47867926955223083
Validation loss = 0.4787551760673523
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49058085680007935
Validation loss = 0.48129206895828247
Validation loss = 0.48847121000289917
Validation loss = 0.4881322979927063
Validation loss = 0.48946812748908997
Validation loss = 0.48986420035362244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4811379313468933
Validation loss = 0.48346713185310364
Validation loss = 0.4797481894493103
Validation loss = 0.4886205792427063
Validation loss = 0.48483508825302124
Validation loss = 0.49018561840057373
Validation loss = 0.49017682671546936
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.48595303297042847
Validation loss = 0.4836597740650177
Validation loss = 0.48862141370773315
Validation loss = 0.49170175194740295
Validation loss = 0.4924248456954956
Validation loss = 0.4931987524032593
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8909287257019438
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.8910463861920173
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8900862068965517
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8891280947255114
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8881720430107527
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8915145005370569
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.8927038626609443
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8917470525187567
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.8950749464668094
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8941176470588236
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.8931623931623932
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 0.9028815368196371
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9040511727078892
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9073482428115016
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9063829787234042
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9054197662061636
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9065817409766455
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9056203605514316
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9046610169491526
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9037037037037037
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9080338266384778
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9070749736008448
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9061181434599156
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9051633298208641
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9094736842105263
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -15      |
| Iteration     | 36       |
| MaximumReturn | -0.0933  |
| MinimumReturn | -75.8    |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4929264187812805
Validation loss = 0.49938544631004333
Validation loss = 0.49793827533721924
Validation loss = 0.4976358115673065
Validation loss = 0.5024460554122925
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.484791100025177
Validation loss = 0.4777774214744568
Validation loss = 0.48029807209968567
Validation loss = 0.488158255815506
Validation loss = 0.48236823081970215
Validation loss = 0.49000951647758484
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49466070532798767
Validation loss = 0.496128648519516
Validation loss = 0.49197858572006226
Validation loss = 0.49870559573173523
Validation loss = 0.4974205791950226
Validation loss = 0.4955689013004303
Validation loss = 0.4964189827442169
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4906814992427826
Validation loss = 0.48799359798431396
Validation loss = 0.48701488971710205
Validation loss = 0.48999443650245667
Validation loss = 0.5005083680152893
Validation loss = 0.4930873215198517
Validation loss = 0.4930259883403778
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4899481534957886
Validation loss = 0.49754706025123596
Validation loss = 0.4936189353466034
Validation loss = 0.4957076609134674
Validation loss = 0.49607861042022705
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9148264984227129
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9138655462184874
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.912906610703043
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9119496855345912
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9109947643979057
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.9184100418410042
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9174503657262278
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9217118997912317
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9207507820646507
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9208333333333333
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9198751300728408
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.918918918918919
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 9
average number of affinization = 0.9273104880581516
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.9263485477178424
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9295336787564766
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9316770186335404
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9327817993795243
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9338842975206612
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.934984520123839
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 0.934020618556701
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 0.9382080329557158
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9403292181069959
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.94655704008222
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9476386036960985
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 11
average number of affinization = 0.9579487179487179
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.5    |
| Iteration     | 37       |
| MaximumReturn | -0.0735  |
| MinimumReturn | -77.9    |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4990445375442505
Validation loss = 0.4983278214931488
Validation loss = 0.5016568899154663
Validation loss = 0.5019112229347229
Validation loss = 0.5026456713676453
Validation loss = 0.5044787526130676
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4819757342338562
Validation loss = 0.4798547029495239
Validation loss = 0.48432204127311707
Validation loss = 0.48737525939941406
Validation loss = 0.49016308784484863
Validation loss = 0.4944378137588501
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4929102063179016
Validation loss = 0.4961143136024475
Validation loss = 0.49844908714294434
Validation loss = 0.49791836738586426
Validation loss = 0.5023485422134399
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4933428466320038
Validation loss = 0.49283891916275024
Validation loss = 0.49719834327697754
Validation loss = 0.4931192696094513
Validation loss = 0.4982829689979553
Validation loss = 0.4939730763435364
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5034394264221191
Validation loss = 0.49372783303260803
Validation loss = 0.49326223134994507
Validation loss = 0.5004411935806274
Validation loss = 0.49385350942611694
Validation loss = 0.4995402693748474
Validation loss = 0.4941226840019226
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9600409836065574
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 0.9611054247697032
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9642126789366053
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 0.9662921348314607
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 0.9734693877551021
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 1
average number of affinization = 0.9734964322120285
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9765784114052953
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 0.9816887080366226
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 4
average number of affinization = 0.9847560975609756
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.9908629441624366
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 0.9969574036511156
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0040485829959513
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0030333670374114
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.006060606060606
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.012108980827447
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0131048387096775
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0181268882175227
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0181086519114688
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.0190954773869347
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.0281124497991967
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.033099297893681
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0360721442885772
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.04004004004004
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.04
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -54.3    |
| Iteration     | 38       |
| MaximumReturn | -0.159   |
| MinimumReturn | -112     |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5025500655174255
Validation loss = 0.5069507956504822
Validation loss = 0.5021118521690369
Validation loss = 0.5020135045051575
Validation loss = 0.5093385577201843
Validation loss = 0.5077589750289917
Validation loss = 0.508444607257843
Validation loss = 0.5015897750854492
Validation loss = 0.5139272809028625
Validation loss = 0.5159254670143127
Validation loss = 0.5165724158287048
Validation loss = 0.5125396847724915
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4871377646923065
Validation loss = 0.4879629611968994
Validation loss = 0.4835952818393707
Validation loss = 0.4880806803703308
Validation loss = 0.49120283126831055
Validation loss = 0.4927031695842743
Validation loss = 0.5034705400466919
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49579140543937683
Validation loss = 0.501964807510376
Validation loss = 0.49660542607307434
Validation loss = 0.5006083250045776
Validation loss = 0.5047110319137573
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4940911531448364
Validation loss = 0.4945065975189209
Validation loss = 0.5011265277862549
Validation loss = 0.4991437792778015
Validation loss = 0.5024325251579285
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4966360032558441
Validation loss = 0.49698078632354736
Validation loss = 0.5066031217575073
Validation loss = 0.4987673759460449
Validation loss = 0.5005578994750977
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0389610389610389
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.0449101796407185
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.044865403788634
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.048804780876494
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0537313432835822
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0526838966202783
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.054617676266137
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.060515873015873
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0594648166501486
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.0673267326732674
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0722057368941642
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.0741106719367588
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0779861796643633
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.0779092702169626
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.0827586206896551
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0816929133858268
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.0845624385447394
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0834970530451866
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0824337585868498
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.088235294117647
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0871694417238003
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.0909980430528377
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.0899315738025415
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.0966796875
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.1034146341463416
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.59    |
| Iteration     | 39       |
| MaximumReturn | -0.0753  |
| MinimumReturn | -45.9    |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5105376839637756
Validation loss = 0.5167100429534912
Validation loss = 0.513212263584137
Validation loss = 0.5150250196456909
Validation loss = 0.5148755311965942
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4957144558429718
Validation loss = 0.4882183074951172
Validation loss = 0.4920583665370941
Validation loss = 0.4942951202392578
Validation loss = 0.49415475130081177
Validation loss = 0.5046223998069763
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5040231943130493
Validation loss = 0.49604713916778564
Validation loss = 0.49678826332092285
Validation loss = 0.5008901953697205
Validation loss = 0.5005266666412354
Validation loss = 0.502673327922821
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49598923325538635
Validation loss = 0.5003170967102051
Validation loss = 0.49903735518455505
Validation loss = 0.5027003884315491
Validation loss = 0.5068352818489075
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.50104159116745
Validation loss = 0.4979441463947296
Validation loss = 0.4994843304157257
Validation loss = 0.5018811225891113
Validation loss = 0.5070194602012634
Validation loss = 0.504002571105957
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.1091617933723197
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1080817916260954
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1089494163424125
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1078717201166182
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1106796116504853
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.1173617846750727
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1162790697674418
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1161665053242982
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1150870406189555
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1188405797101448
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1177606177606179
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1166827386692382
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1184971098265897
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1212704523580366
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1201923076923077
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1239193083573487
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1257197696737045
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.1265580057526365
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.1350574712643677
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.137799043062201
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1414913957934991
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.1470869149952245
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.1469465648854962
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.1525262154432794
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1542857142857144
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -34.1    |
| Iteration     | 40       |
| MaximumReturn | -0.0557  |
| MinimumReturn | -105     |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5117228627204895
Validation loss = 0.5126530528068542
Validation loss = 0.5194527506828308
Validation loss = 0.5191364884376526
Validation loss = 0.5167573690414429
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4981645345687866
Validation loss = 0.49664226174354553
Validation loss = 0.4983193576335907
Validation loss = 0.5002437233924866
Validation loss = 0.4995500147342682
Validation loss = 0.4994719326496124
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5074814558029175
Validation loss = 0.5064047574996948
Validation loss = 0.5038039088249207
Validation loss = 0.5052645802497864
Validation loss = 0.5076799392700195
Validation loss = 0.5077386498451233
Validation loss = 0.5139240026473999
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5038370490074158
Validation loss = 0.5026178956031799
Validation loss = 0.5010542869567871
Validation loss = 0.503998875617981
Validation loss = 0.5061091780662537
Validation loss = 0.5111648440361023
Validation loss = 0.5108680725097656
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5115174651145935
Validation loss = 0.503862738609314
Validation loss = 0.5022658109664917
Validation loss = 0.5093759298324585
Validation loss = 0.5136001110076904
Validation loss = 0.5074449181556702
Validation loss = 0.5086373090744019
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1560418648905804
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.155893536121673
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.1623931623931625
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.1669829222011385
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.1715639810426541
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1704545454545454
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.1778618732261117
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1767485822306238
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.182247403210576
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1858490566037736
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1894439208294063
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.1883239171374764
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1900282220131704
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.1926691729323309
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1943661971830986
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.1960600375234522
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.1996251171508903
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 16
average number of affinization = 1.2134831460674158
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.2132834424695977
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.216822429906542
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.219421101774043
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.2210820895522387
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.222739981360671
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.2262569832402235
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.2251162790697674
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.09    |
| Iteration     | 41       |
| MaximumReturn | -0.147   |
| MinimumReturn | -53.6    |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5155407190322876
Validation loss = 0.5135611891746521
Validation loss = 0.5142080187797546
Validation loss = 0.5179979205131531
Validation loss = 0.5193381309509277
Validation loss = 0.5190785527229309
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.498671293258667
Validation loss = 0.49649855494499207
Validation loss = 0.504067599773407
Validation loss = 0.5038809776306152
Validation loss = 0.506263792514801
Validation loss = 0.5012999176979065
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5073587894439697
Validation loss = 0.5052367448806763
Validation loss = 0.5115430951118469
Validation loss = 0.5104589462280273
Validation loss = 0.512234628200531
Validation loss = 0.5166115164756775
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.504377543926239
Validation loss = 0.502555251121521
Validation loss = 0.5075361728668213
Validation loss = 0.506266176700592
Validation loss = 0.5098790526390076
Validation loss = 0.5130096673965454
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5039255023002625
Validation loss = 0.5124724507331848
Validation loss = 0.5057045817375183
Validation loss = 0.5165422558784485
Validation loss = 0.512592613697052
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.233271375464684
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2358402971216342
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2402597402597402
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.2409638554216869
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.2527777777777778
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2553191489361701
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2597042513863217
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.2585410895660203
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.257380073800738
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.2608294930875577
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.2596685082872927
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2640294388224471
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.2683823529411764
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.2754820936639117
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.2743119266055045
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.2777268560953254
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.2765567765567765
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2790484903934127
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.2824497257769654
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.2812785388127854
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.280109489051095
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2825888787602553
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.2877959927140255
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.2948134667879891
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.2954545454545454
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.28    |
| Iteration     | 42       |
| MaximumReturn | -0.0941  |
| MinimumReturn | -17.1    |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5201337933540344
Validation loss = 0.5170333385467529
Validation loss = 0.5168273448944092
Validation loss = 0.5262738466262817
Validation loss = 0.5200149416923523
Validation loss = 0.5244558453559875
Validation loss = 0.5238521099090576
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5057260990142822
Validation loss = 0.5017443299293518
Validation loss = 0.5046659111976624
Validation loss = 0.49992719292640686
Validation loss = 0.5059247612953186
Validation loss = 0.5061200261116028
Validation loss = 0.5088154673576355
Validation loss = 0.5108315944671631
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5158829092979431
Validation loss = 0.5132497549057007
Validation loss = 0.5123997926712036
Validation loss = 0.5134919881820679
Validation loss = 0.5130310654640198
Validation loss = 0.5160212516784668
Validation loss = 0.5200170278549194
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5146222114562988
Validation loss = 0.5161658525466919
Validation loss = 0.5155050158500671
Validation loss = 0.5144734978675842
Validation loss = 0.515906810760498
Validation loss = 0.5166416168212891
Validation loss = 0.515076220035553
Validation loss = 0.5176419615745544
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5083500146865845
Validation loss = 0.5127346515655518
Validation loss = 0.5147862434387207
Validation loss = 0.5134584307670593
Validation loss = 0.5121449828147888
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.2979109900090826
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.2985480943738656
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.300090661831369
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.3043478260869565
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.3031674208144797
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.3092224231464737
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.3134598012646792
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3149819494584838
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.3219116321009918
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.3207207207207208
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.324932493249325
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3264388489208634
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.325247079964061
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.3258527827648114
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.328251121076233
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.3279569892473118
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.3267681289167412
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3282647584973166
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.3333333333333333
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.3366071428571429
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.3398751115075824
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.338680926916221
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.338379341050757
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.349644128113879
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.3502222222222222
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.41    |
| Iteration     | 43       |
| MaximumReturn | -0.168   |
| MinimumReturn | -86.2    |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5248551964759827
Validation loss = 0.522473931312561
Validation loss = 0.5199140310287476
Validation loss = 0.525511622428894
Validation loss = 0.5308464765548706
Validation loss = 0.5333437919616699
Validation loss = 0.5296941995620728
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5068424344062805
Validation loss = 0.5074895024299622
Validation loss = 0.5048990249633789
Validation loss = 0.5080326199531555
Validation loss = 0.5139784812927246
Validation loss = 0.5122219920158386
Validation loss = 0.5152053833007812
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.524107813835144
Validation loss = 0.5190223455429077
Validation loss = 0.5195438861846924
Validation loss = 0.5175835490226746
Validation loss = 0.518977165222168
Validation loss = 0.5220480561256409
Validation loss = 0.5236234664916992
Validation loss = 0.5338315963745117
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.51752769947052
Validation loss = 0.517650842666626
Validation loss = 0.5172417759895325
Validation loss = 0.5214563608169556
Validation loss = 0.5206350088119507
Validation loss = 0.5206957459449768
Validation loss = 0.523758590221405
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5133436322212219
Validation loss = 0.5132750868797302
Validation loss = 0.5153546333312988
Validation loss = 0.5145735144615173
Validation loss = 0.5160280466079712
Validation loss = 0.5193678140640259
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.3499111900532859
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.3531499556344277
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.3554964539007093
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.3658104517271923
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.3699115044247787
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3713527851458887
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.377208480565371
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.3786407766990292
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.3783068783068784
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.381497797356828
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.3855633802816902
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 18
average number of affinization = 1.4001759014951627
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.4015817223198594
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.4029850746268657
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.4078947368421053
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.4075372480280455
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 18
average number of affinization = 1.4220665499124343
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.4243219597550307
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.4291958041958042
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.4366812227074235
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.4354275741710296
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.4341761115954665
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.4364111498257839
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.4473455178416015
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.4565217391304348
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.3    |
| Iteration     | 44       |
| MaximumReturn | -0.103   |
| MinimumReturn | -107     |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5228357911109924
Validation loss = 0.5243842005729675
Validation loss = 0.5244479775428772
Validation loss = 0.5332325100898743
Validation loss = 0.5319867134094238
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.515479326248169
Validation loss = 0.5079164505004883
Validation loss = 0.5105626583099365
Validation loss = 0.5188804268836975
Validation loss = 0.5179483890533447
Validation loss = 0.5203239917755127
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5216678977012634
Validation loss = 0.5225837230682373
Validation loss = 0.5225285291671753
Validation loss = 0.5269637107849121
Validation loss = 0.5280347466468811
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5224078297615051
Validation loss = 0.5219084620475769
Validation loss = 0.516367495059967
Validation loss = 0.5200489163398743
Validation loss = 0.5222169160842896
Validation loss = 0.5267691612243652
Validation loss = 0.5283615589141846
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5227761268615723
Validation loss = 0.5152029991149902
Validation loss = 0.5123971104621887
Validation loss = 0.5176262855529785
Validation loss = 0.5218826532363892
Validation loss = 0.5211025476455688
Validation loss = 0.5228238701820374
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.4552562988705473
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.4539930555555556
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.4588031222896791
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.466204506065858
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.4701298701298702
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.4731833910034602
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.4727744165946413
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.4810017271157168
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.4866264020707507
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.4887931034482758
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.4883720930232558
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.4879518072289157
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.4883920894239038
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.4922680412371134
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.5004291845493563
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.4991423670668953
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.5055698371893744
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.5102739726027397
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.5192472198460223
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.5196581196581196
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.5183603757472246
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5213310580204777
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.5234441602728048
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.5315161839863713
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.531063829787234
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.1    |
| Iteration     | 45       |
| MaximumReturn | -0.14    |
| MinimumReturn | -106     |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5315026044845581
Validation loss = 0.5273574590682983
Validation loss = 0.5324383974075317
Validation loss = 0.5319423079490662
Validation loss = 0.5324856042861938
Validation loss = 0.5311686396598816
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.520260751247406
Validation loss = 0.5202926397323608
Validation loss = 0.5218839049339294
Validation loss = 0.5152608156204224
Validation loss = 0.5209543704986572
Validation loss = 0.5253128409385681
Validation loss = 0.522628903388977
Validation loss = 0.5233747959136963
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5239802002906799
Validation loss = 0.5260587930679321
Validation loss = 0.5269048810005188
Validation loss = 0.5286723971366882
Validation loss = 0.5352577567100525
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5253430604934692
Validation loss = 0.5283668041229248
Validation loss = 0.5309063792228699
Validation loss = 0.5282620191574097
Validation loss = 0.5293219089508057
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5198506712913513
Validation loss = 0.5188084840774536
Validation loss = 0.5249564051628113
Validation loss = 0.5257357954978943
Validation loss = 0.5264083743095398
Validation loss = 0.521587610244751
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.538265306122449
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.5369583687340698
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.5407470288624787
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.5470737913486006
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.547457627118644
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.5520745131244709
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.5609137055837563
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.5595942519019441
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.5641891891891893
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.5696202531645569
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.5767284991568298
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.5770850884582983
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.5774410774410774
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.5820016820857863
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.5890756302521007
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5919395465994963
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5947986577181208
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.5976529756915339
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.600502512562814
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.610878661087866
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.6145484949832776
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.617376775271512
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.6176961602671118
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.6221851542952461
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.6225
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.91    |
| Iteration     | 46       |
| MaximumReturn | -0.0811  |
| MinimumReturn | -46.3    |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5296262502670288
Validation loss = 0.5320274233818054
Validation loss = 0.5345152616500854
Validation loss = 0.5330870151519775
Validation loss = 0.5380513072013855
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5295556783676147
Validation loss = 0.5198045969009399
Validation loss = 0.5205333828926086
Validation loss = 0.5255603194236755
Validation loss = 0.5256155729293823
Validation loss = 0.5269268751144409
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.527667224407196
Validation loss = 0.5279788970947266
Validation loss = 0.5322761535644531
Validation loss = 0.5299139618873596
Validation loss = 0.5300238132476807
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5255163908004761
Validation loss = 0.5264283418655396
Validation loss = 0.5289021730422974
Validation loss = 0.5242081880569458
Validation loss = 0.5276150703430176
Validation loss = 0.526046633720398
Validation loss = 0.5300623178482056
Validation loss = 0.5324549674987793
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5212109684944153
Validation loss = 0.5207383036613464
Validation loss = 0.5241419672966003
Validation loss = 0.5260235667228699
Validation loss = 0.5292021036148071
Validation loss = 0.5334243774414062
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.6261448792672772
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.6314475873544094
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.633416458852868
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.6345514950166113
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.6390041493775933
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.6475953565505805
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.6512013256006628
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.6572847682119205
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.6567411083540116
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.6553719008264463
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.657308009909166
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.6608910891089108
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.6702390766694146
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.672158154859967
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.6814814814814816
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.6875
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.6935086277732128
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.6929392446633826
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.697292863002461
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.701639344262295
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.7092547092547092
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.7152209492635024
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.7154538021259198
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7189542483660132
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.7240816326530612
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -42.3    |
| Iteration     | 47       |
| MaximumReturn | -0.139   |
| MinimumReturn | -155     |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5325767397880554
Validation loss = 0.5331735610961914
Validation loss = 0.5270800590515137
Validation loss = 0.5336891412734985
Validation loss = 0.5353091955184937
Validation loss = 0.5361703634262085
Validation loss = 0.5393139123916626
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5234236121177673
Validation loss = 0.5189278721809387
Validation loss = 0.5189977884292603
Validation loss = 0.520114541053772
Validation loss = 0.5249948501586914
Validation loss = 0.5277690887451172
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.523716151714325
Validation loss = 0.5254024267196655
Validation loss = 0.5267744064331055
Validation loss = 0.5341181755065918
Validation loss = 0.5310031771659851
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5305160284042358
Validation loss = 0.5276883840560913
Validation loss = 0.5270408987998962
Validation loss = 0.5384333729743958
Validation loss = 0.5273882150650024
Validation loss = 0.5291513204574585
Validation loss = 0.5349334478378296
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5273113250732422
Validation loss = 0.5224432945251465
Validation loss = 0.5256255865097046
Validation loss = 0.5286802649497986
Validation loss = 0.5298504829406738
Validation loss = 0.5267871618270874
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.7251223491027732
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.7318663406682966
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.732084690553746
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.7363710333604556
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.7447154471544715
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.7465475223395612
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.75
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.7502027575020276
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.752836304700162
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.7530364372469636
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.7548543689320388
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.761519805982215
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.765751211631664
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.7707828894269573
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7741935483870968
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.7792103142626914
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.788244766505636
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.7884151246983104
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.7918006430868167
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.8
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.8073836276083468
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8107457898957497
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.8100961538461537
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.812650120096077
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.8168
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -86.5    |
| Iteration     | 48       |
| MaximumReturn | -0.653   |
| MinimumReturn | -141     |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5312210917472839
Validation loss = 0.5334264039993286
Validation loss = 0.5376503467559814
Validation loss = 0.537605345249176
Validation loss = 0.5366917252540588
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5204617381095886
Validation loss = 0.52280193567276
Validation loss = 0.5227859020233154
Validation loss = 0.5323219299316406
Validation loss = 0.524360716342926
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5252137780189514
Validation loss = 0.5261088013648987
Validation loss = 0.5284547805786133
Validation loss = 0.5298689603805542
Validation loss = 0.5288403630256653
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5303170084953308
Validation loss = 0.534317135810852
Validation loss = 0.5289428234100342
Validation loss = 0.5295740365982056
Validation loss = 0.5357198715209961
Validation loss = 0.5343580842018127
Validation loss = 0.5333632230758667
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5258103609085083
Validation loss = 0.5244787335395813
Validation loss = 0.5297836661338806
Validation loss = 0.5276892781257629
Validation loss = 0.525540292263031
Validation loss = 0.533987820148468
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.818545163868905
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.8186900958466454
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.8204309656823623
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.8221690590111643
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.8270916334661356
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.8272292993630572
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.8369132856006365
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 5
average number of affinization = 1.8394276629570747
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.8379666401906274
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 2
average number of affinization = 1.838095238095238
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 1.8413957176843774
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.8454833597464342
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.8463974663499605
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 16
average number of affinization = 1.8575949367088607
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.866403162055336
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.867298578199052
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.8658247829518548
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.8706624605678233
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.8770685579196218
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.8811023622047245
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 17
average number of affinization = 1.892997639653816
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.9001572327044025
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 11
average number of affinization = 1.9073055773762766
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.91287284144427
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.9223529411764706
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -105     |
| Iteration     | 49       |
| MaximumReturn | -45.9    |
| MinimumReturn | -161     |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5321895480155945
Validation loss = 0.5336728692054749
Validation loss = 0.5365877151489258
Validation loss = 0.5350252985954285
Validation loss = 0.5325457453727722
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5229096412658691
Validation loss = 0.5212977528572083
Validation loss = 0.5225870609283447
Validation loss = 0.523539125919342
Validation loss = 0.5240346193313599
Validation loss = 0.5250222682952881
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5291342735290527
Validation loss = 0.5295636653900146
Validation loss = 0.5347275733947754
Validation loss = 0.5311616063117981
Validation loss = 0.5389806628227234
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5326570272445679
Validation loss = 0.5295577645301819
Validation loss = 0.5332445502281189
Validation loss = 0.5354189276695251
Validation loss = 0.5327895283699036
Validation loss = 0.5373007655143738
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5257315039634705
Validation loss = 0.5266902446746826
Validation loss = 0.5287160277366638
Validation loss = 0.5288718342781067
Validation loss = 0.528698742389679
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.926332288401254
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 14
average number of affinization = 1.9357870007830853
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 16
average number of affinization = 1.94679186228482
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.946051602814699
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 1.95078125
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.9523809523809523
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.9563182527301093
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 0
average number of affinization = 1.9547934528448947
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 1.9587227414330217
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 1.959533073929961
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 9
average number of affinization = 1.9650077760497666
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 1.9712509712509712
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 13
average number of affinization = 1.9798136645962734
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 15
average number of affinization = 1.9899146625290922
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 1
average number of affinization = 1.9891472868217055
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 4
average number of affinization = 1.9907048799380325
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 1.998452012383901
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.0015467904098996
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.001545595054096
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.006177606177606
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0084876543209877
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.0115651503469545
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.0177195685670264
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 0
average number of affinization = 2.016166281755196
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.017692307692308
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49      |
| Iteration     | 50       |
| MaximumReturn | -0.302   |
| MinimumReturn | -158     |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5354622006416321
Validation loss = 0.525609016418457
Validation loss = 0.5301733016967773
Validation loss = 0.5396615266799927
Validation loss = 0.5353851318359375
Validation loss = 0.5386019945144653
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5220237374305725
Validation loss = 0.5197410583496094
Validation loss = 0.5219722390174866
Validation loss = 0.5276234745979309
Validation loss = 0.5281166434288025
Validation loss = 0.5249183773994446
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.533255934715271
Validation loss = 0.5284145474433899
Validation loss = 0.5258852243423462
Validation loss = 0.5331103801727295
Validation loss = 0.5294216871261597
Validation loss = 0.5370903611183167
Validation loss = 0.5418521761894226
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5277104377746582
Validation loss = 0.5272870063781738
Validation loss = 0.5326589345932007
Validation loss = 0.5301867127418518
Validation loss = 0.5324751734733582
Validation loss = 0.5327743291854858
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5311945676803589
Validation loss = 0.5271796584129333
Validation loss = 0.5277637839317322
Validation loss = 0.5288360714912415
Validation loss = 0.5321420431137085
Validation loss = 0.5343592166900635
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.0284396617986165
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.035330261136713
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.0360706062931695
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.041411042944785
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.049808429118774
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 0
average number of affinization = 2.048238897396631
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.0489671002295333
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.0489296636085625
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.061879297173415
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 0
average number of affinization = 2.0603053435114504
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.0617848970251718
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.0678353658536586
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 0
average number of affinization = 2.0662604722010665
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.0684931506849313
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.0692015209125474
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.0790273556231003
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.0865603644646926
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.094081942336874
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.1015921152388173
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.1053030303030305
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.112793338380015
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.12178517397882
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.127739984882842
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.134441087613293
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.141132075471698
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -29.5    |
| Iteration     | 51       |
| MaximumReturn | -0.267   |
| MinimumReturn | -97.7    |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5323714017868042
Validation loss = 0.5338699817657471
Validation loss = 0.5317559838294983
Validation loss = 0.5328521728515625
Validation loss = 0.5350098013877869
Validation loss = 0.5415796637535095
Validation loss = 0.5422661304473877
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.525414228439331
Validation loss = 0.5199083685874939
Validation loss = 0.5255191326141357
Validation loss = 0.5249159932136536
Validation loss = 0.5286323428153992
Validation loss = 0.5245548486709595
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5325576066970825
Validation loss = 0.5351269245147705
Validation loss = 0.5324711203575134
Validation loss = 0.5360270142555237
Validation loss = 0.5355082750320435
Validation loss = 0.5427092909812927
Validation loss = 0.535632312297821
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5308809280395508
Validation loss = 0.5312194228172302
Validation loss = 0.536092221736908
Validation loss = 0.5360526442527771
Validation loss = 0.5305820107460022
Validation loss = 0.5326566696166992
Validation loss = 0.5416648387908936
Validation loss = 0.5405826568603516
Validation loss = 0.5382772088050842
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5294269323348999
Validation loss = 0.5288624167442322
Validation loss = 0.5313681960105896
Validation loss = 0.5287808775901794
Validation loss = 0.5301498174667358
Validation loss = 0.5357321500778198
Validation loss = 0.532379686832428
Validation loss = 0.5361955165863037
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.1463046757164403
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.152976639035418
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.1566265060240966
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.1625282167042887
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 2
average number of affinization = 2.162406015037594
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.1682945154019535
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.174174174174174
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.1815453863465866
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.18215892053973
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.182771535580524
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.190868263473054
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.195213163799551
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.195814648729447
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.200149365197909
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.2044776119402987
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.208053691275168
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.2108792846497765
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.2189128816083397
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.2261904761904763
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.2267657992565058
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.2295690936106984
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.2375649591685227
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 21
average number of affinization = 2.2514836795252227
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.2587101556708675
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.262962962962963
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -41.4    |
| Iteration     | 52       |
| MaximumReturn | -0.258   |
| MinimumReturn | -102     |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5360739231109619
Validation loss = 0.5301188230514526
Validation loss = 0.5449312925338745
Validation loss = 0.5364946722984314
Validation loss = 0.5340806245803833
Validation loss = 0.5371843576431274
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.528306782245636
Validation loss = 0.5199925303459167
Validation loss = 0.5271245837211609
Validation loss = 0.5219847559928894
Validation loss = 0.5285018086433411
Validation loss = 0.5255918502807617
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5326935648918152
Validation loss = 0.5312134027481079
Validation loss = 0.5375374555587769
Validation loss = 0.5356524586677551
Validation loss = 0.5419833064079285
Validation loss = 0.5363078713417053
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5288918018341064
Validation loss = 0.5324704647064209
Validation loss = 0.5314179062843323
Validation loss = 0.5337814092636108
Validation loss = 0.5358738899230957
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5293470621109009
Validation loss = 0.5260098576545715
Validation loss = 0.5323042869567871
Validation loss = 0.5323461890220642
Validation loss = 0.5314797759056091
Validation loss = 0.5319278836250305
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.2679496669133976
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.274408284023669
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 18
average number of affinization = 2.286031042128603
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.295420974889217
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.299630996309963
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.3045722713864305
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 1
average number of affinization = 2.303610906411201
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.3070692194403533
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.3112582781456954
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.3198529411764706
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.32035268185158
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.327459618208517
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.3294203961848865
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.334310850439883
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.3355311355311357
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.3389458272327963
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.346013167520117
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.3508771929824563
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.3528122717311906
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.3547445255474453
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.358862144420131
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.361516034985423
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.365622723962127
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.3719068413391557
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.3767272727272726
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -72.9    |
| Iteration     | 53       |
| MaximumReturn | -3.26    |
| MinimumReturn | -137     |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5293660759925842
Validation loss = 0.5289093852043152
Validation loss = 0.5395558476448059
Validation loss = 0.5309092402458191
Validation loss = 0.5291529297828674
Validation loss = 0.533135712146759
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5165807604789734
Validation loss = 0.5194509029388428
Validation loss = 0.5201016068458557
Validation loss = 0.5247410535812378
Validation loss = 0.5204522609710693
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5275071263313293
Validation loss = 0.5287625789642334
Validation loss = 0.5281276106834412
Validation loss = 0.531134843826294
Validation loss = 0.5320780277252197
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5248233079910278
Validation loss = 0.5244534611701965
Validation loss = 0.5258944034576416
Validation loss = 0.5287078619003296
Validation loss = 0.528199315071106
Validation loss = 0.5306317210197449
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5236858129501343
Validation loss = 0.5272464156150818
Validation loss = 0.5325062870979309
Validation loss = 0.527370035648346
Validation loss = 0.5270975828170776
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.3829941860465116
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.3892519970951343
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.393323657474601
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.399564902102973
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.4079710144927535
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.4163649529326574
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.4218523878437046
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.42588575560376
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.4328034682080926
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.44043321299639
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.442279942279942
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.4434030281182406
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.4502881844380404
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.455003599712023
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.4575539568345324
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 3
average number of affinization = 2.457943925233645
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.461206896551724
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.4666188083273513
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.472022955523673
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.4738351254480286
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.477077363896848
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.4831782390837507
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.487839771101574
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.496783416726233
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.500714285714286
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -105     |
| Iteration     | 54       |
| MaximumReturn | -19.9    |
| MinimumReturn | -151     |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5214512348175049
Validation loss = 0.517120361328125
Validation loss = 0.5215353965759277
Validation loss = 0.5231786370277405
Validation loss = 0.5233432054519653
Validation loss = 0.5313345193862915
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5082845091819763
Validation loss = 0.5130971670150757
Validation loss = 0.5129936337471008
Validation loss = 0.5104930996894836
Validation loss = 0.512490451335907
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5163282155990601
Validation loss = 0.5193144083023071
Validation loss = 0.5191336870193481
Validation loss = 0.5202468633651733
Validation loss = 0.5231708884239197
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5107147097587585
Validation loss = 0.52093905210495
Validation loss = 0.5165620446205139
Validation loss = 0.5214187502861023
Validation loss = 0.5227938294410706
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5173520445823669
Validation loss = 0.5162986516952515
Validation loss = 0.5255728363990784
Validation loss = 0.5214203596115112
Validation loss = 0.5224804878234863
Validation loss = 0.5191184878349304
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.5032119914346893
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.5042796005706136
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.5060584461867426
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.5085470085470085
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.518149466192171
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.5213371266002844
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.5245202558635396
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.5276988636363638
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.5330021291696236
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.5382978723404257
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.540751240255138
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.5432011331444757
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.5463552724699223
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.5523338048090523
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.5561837455830387
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.562853107344633
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.569513055751588
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.57475317348378
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.5785764622973923
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.5873239436619717
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.593244194229416
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.5956399437412094
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.600843288826423
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.6088483146067416
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.6182456140350876
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -79.8    |
| Iteration     | 55       |
| MaximumReturn | -4.39    |
| MinimumReturn | -135     |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5179591774940491
Validation loss = 0.5189734697341919
Validation loss = 0.5156404376029968
Validation loss = 0.5193224549293518
Validation loss = 0.5213398337364197
Validation loss = 0.5180864334106445
Validation loss = 0.5259091854095459
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5047236680984497
Validation loss = 0.5058013200759888
Validation loss = 0.5051299333572388
Validation loss = 0.5081446170806885
Validation loss = 0.5091862082481384
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5129671096801758
Validation loss = 0.5145470499992371
Validation loss = 0.5178191661834717
Validation loss = 0.5194659233093262
Validation loss = 0.5173521041870117
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5130316615104675
Validation loss = 0.5138902068138123
Validation loss = 0.5129815936088562
Validation loss = 0.518123209476471
Validation loss = 0.520817756652832
Validation loss = 0.5160167217254639
Validation loss = 0.5204477310180664
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5129520893096924
Validation loss = 0.5138970017433167
Validation loss = 0.5164436101913452
Validation loss = 0.5205713510513306
Validation loss = 0.5140992403030396
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.626227208976157
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.6299929922915206
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.6358543417366946
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.639608117564731
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.6475524475524477
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.6512928022361986
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 20
average number of affinization = 2.6634078212290504
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.6706210746685275
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 2.674337517433752
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.6794425087108014
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.6866295264623954
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.687543493389005
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.6933240611961056
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.7046560111188325
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.7069444444444444
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.712005551700208
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.717753120665742
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.722106722106722
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.7264542936288088
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.7294117647058824
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.731673582295989
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.7339322736696614
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.7451657458563536
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.749482401656315
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.7524137931034485
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -69.5    |
| Iteration     | 56       |
| MaximumReturn | -0.432   |
| MinimumReturn | -127     |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5156849026679993
Validation loss = 0.512862503528595
Validation loss = 0.5130615234375
Validation loss = 0.5199940800666809
Validation loss = 0.5218989849090576
Validation loss = 0.5190919041633606
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5022076964378357
Validation loss = 0.5017479062080383
Validation loss = 0.503969132900238
Validation loss = 0.5040079951286316
Validation loss = 0.5082871913909912
Validation loss = 0.5029997229576111
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5163108706474304
Validation loss = 0.5109989643096924
Validation loss = 0.5147371292114258
Validation loss = 0.5155455470085144
Validation loss = 0.5156257152557373
Validation loss = 0.5149992108345032
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5088329315185547
Validation loss = 0.5125954747200012
Validation loss = 0.5137409567832947
Validation loss = 0.512950599193573
Validation loss = 0.5142920613288879
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5100509524345398
Validation loss = 0.5115962028503418
Validation loss = 0.5112506747245789
Validation loss = 0.5135130286216736
Validation loss = 0.5127049088478088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.7532736044107513
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.7582644628099175
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 2.7598072952512043
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.764786795048143
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.7759450171821305
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.778159340659341
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 2.7810569663692517
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 2.779835390946502
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.7888965044551064
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.791095890410959
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.7967145790554415
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.8036935704514363
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.81203007518797
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.8189890710382515
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 14
average number of affinization = 2.826621160409556
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.8362892223738063
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 15
average number of affinization = 2.8445807770961147
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.8501362397820165
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.8611300204220558
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.8700680272108845
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 9
average number of affinization = 2.874235214140041
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.879076086956522
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.8886625933469112
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.8955223880597014
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.906440677966102
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -72.8    |
| Iteration     | 57       |
| MaximumReturn | -0.363   |
| MinimumReturn | -136     |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5211782455444336
Validation loss = 0.5165205001831055
Validation loss = 0.5209304690361023
Validation loss = 0.5175184011459351
Validation loss = 0.5202028751373291
Validation loss = 0.5200312733650208
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5056179165840149
Validation loss = 0.5089099407196045
Validation loss = 0.5034940242767334
Validation loss = 0.5043053030967712
Validation loss = 0.5079556107521057
Validation loss = 0.5092729926109314
Validation loss = 0.5067496299743652
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5115072131156921
Validation loss = 0.5123820900917053
Validation loss = 0.5168874859809875
Validation loss = 0.516782820224762
Validation loss = 0.5152649283409119
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5123007297515869
Validation loss = 0.5117437243461609
Validation loss = 0.5120144486427307
Validation loss = 0.5146575570106506
Validation loss = 0.5170797109603882
Validation loss = 0.5204062461853027
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5101761817932129
Validation loss = 0.5116993188858032
Validation loss = 0.5137952566146851
Validation loss = 0.516884446144104
Validation loss = 0.5152435898780823
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.913279132791328
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 16
average number of affinization = 2.922139471902505
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.9242219215155614
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.924949290060852
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.927027027027027
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 12
average number of affinization = 2.9331532748143148
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 22
average number of affinization = 2.9460188933873144
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 13
average number of affinization = 2.9527983816587997
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 18
average number of affinization = 2.9629380053908356
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 19
average number of affinization = 2.973737373737374
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 2.9791386271870794
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 17
average number of affinization = 2.988567585743107
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 10
average number of affinization = 2.993279569892473
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 2.995298858294157
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.0
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 0
average number of affinization = 2.9979879275653922
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 4
average number of affinization = 2.9986595174262733
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.0080375083724045
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 3
average number of affinization = 3.0080321285140563
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.0120401337792644
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.016711229946524
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.0200400801603204
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.027369826435247
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.036024016010674
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.0433333333333334
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -29.7    |
| Iteration     | 58       |
| MaximumReturn | -0.395   |
| MinimumReturn | -122     |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5221128463745117
Validation loss = 0.5186784267425537
Validation loss = 0.5236920714378357
Validation loss = 0.5247502326965332
Validation loss = 0.5257341861724854
Validation loss = 0.52581387758255
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5080846548080444
Validation loss = 0.5046284198760986
Validation loss = 0.5108473300933838
Validation loss = 0.5078794360160828
Validation loss = 0.5095999240875244
Validation loss = 0.5134100914001465
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5207263827323914
Validation loss = 0.5197826623916626
Validation loss = 0.5181649327278137
Validation loss = 0.5155526995658875
Validation loss = 0.5190728902816772
Validation loss = 0.5193127989768982
Validation loss = 0.525886058807373
Validation loss = 0.5195685029029846
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5159212350845337
Validation loss = 0.5114132761955261
Validation loss = 0.5182376503944397
Validation loss = 0.5170590281486511
Validation loss = 0.5166739225387573
Validation loss = 0.5176769495010376
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5123204588890076
Validation loss = 0.5134851336479187
Validation loss = 0.517221212387085
Validation loss = 0.512152373790741
Validation loss = 0.5221751928329468
Validation loss = 0.5170994400978088
Validation loss = 0.5182512402534485
Validation loss = 0.5206488966941833
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 30
average number of affinization = 3.061292471685543
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.0639147802929427
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.0671989354624087
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.077792553191489
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.086378737541528
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.093625498007968
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.0962176509621764
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.0981432360742707
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.10337972166998
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.1066225165562913
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.1078755790866976
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.117063492063492
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 2
average number of affinization = 3.1163251817580964
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 24
average number of affinization = 3.130118890356671
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.1353135313531353
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.138522427440633
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.14502307185234
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.1515151515151514
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.1606319947333774
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.1651315789473684
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.168310322156476
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.178055190538765
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 21
average number of affinization = 3.1897570584372947
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.1942257217847767
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.2
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -24.6    |
| Iteration     | 59       |
| MaximumReturn | -0.359   |
| MinimumReturn | -112     |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5211127400398254
Validation loss = 0.5252362489700317
Validation loss = 0.525648295879364
Validation loss = 0.5295106768608093
Validation loss = 0.5249316692352295
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5090721845626831
Validation loss = 0.5115458369255066
Validation loss = 0.514487087726593
Validation loss = 0.5098882913589478
Validation loss = 0.5122324228286743
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5198273062705994
Validation loss = 0.5207639336585999
Validation loss = 0.5267004370689392
Validation loss = 0.521502673625946
Validation loss = 0.5222817659378052
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5151893496513367
Validation loss = 0.5174567103385925
Validation loss = 0.516589879989624
Validation loss = 0.5220368504524231
Validation loss = 0.5239093899726868
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5194008946418762
Validation loss = 0.5225605964660645
Validation loss = 0.5179502964019775
Validation loss = 0.5227126479148865
Validation loss = 0.5203927755355835
Validation loss = 0.5214142203330994
Validation loss = 0.5232841372489929
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.2096985583224114
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.2141453831041256
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.220549738219895
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.222367560497057
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.2300653594771243
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.232527759634226
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.2375979112271542
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.2400521852576647
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.246414602346806
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.2547231270358306
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.265625
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.2713077423552375
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.2756827048114436
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.2800519818063676
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.2824675324675323
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.2842310188189487
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.28988326848249
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.2987686325340246
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.2998704663212437
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.3016181229773465
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.3124191461837
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.316742081447964
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.3210594315245476
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.326016785022595
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 3.326451612903226
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.4    |
| Iteration     | 60       |
| MaximumReturn | -1.18    |
| MinimumReturn | -113     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5217077732086182
Validation loss = 0.5225301384925842
Validation loss = 0.5236201882362366
Validation loss = 0.5225493311882019
Validation loss = 0.525846004486084
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5094790458679199
Validation loss = 0.5117113590240479
Validation loss = 0.5103757381439209
Validation loss = 0.5117726922035217
Validation loss = 0.5121157169342041
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5215051174163818
Validation loss = 0.5179611444473267
Validation loss = 0.5216492414474487
Validation loss = 0.5243661403656006
Validation loss = 0.5236420631408691
Validation loss = 0.5246621966362
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5161871910095215
Validation loss = 0.516248881816864
Validation loss = 0.5184551477432251
Validation loss = 0.5198721885681152
Validation loss = 0.5151486992835999
Validation loss = 0.5198370814323425
Validation loss = 0.5195797085762024
Validation loss = 0.5213777422904968
Validation loss = 0.5279407501220703
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.519457995891571
Validation loss = 0.520067036151886
Validation loss = 0.5187436938285828
Validation loss = 0.5263873338699341
Validation loss = 0.5214728713035583
Validation loss = 0.5242295265197754
Validation loss = 0.5227859020233154
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.329464861379755
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.3318298969072164
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.336767546683838
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.341055341055341
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.35048231511254
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.354755784061697
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.357096981374438
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.3626444159178432
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.363694676074407
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.367948717948718
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.3689942344650863
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.378361075544174
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.3832373640435063
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.3887468030690537
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.394888178913738
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.397828863346105
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.4039566049776644
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 4
average number of affinization = 3.4043367346938775
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.4110898661567877
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.4152866242038216
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 21
average number of affinization = 3.426479949077021
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.4287531806615776
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.431023521932613
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.4326556543837357
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.4380952380952383
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -45.2    |
| Iteration     | 61       |
| MaximumReturn | -2.2     |
| MinimumReturn | -139     |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5205886960029602
Validation loss = 0.521842896938324
Validation loss = 0.5230084657669067
Validation loss = 0.521921694278717
Validation loss = 0.5263450741767883
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5110650062561035
Validation loss = 0.5090307593345642
Validation loss = 0.5123863816261292
Validation loss = 0.5143876075744629
Validation loss = 0.5183544754981995
Validation loss = 0.5133481025695801
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5196743607521057
Validation loss = 0.5198814868927002
Validation loss = 0.5178365707397461
Validation loss = 0.5231804251670837
Validation loss = 0.521601140499115
Validation loss = 0.5284436941146851
Validation loss = 0.5255997180938721
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.519525408744812
Validation loss = 0.52890944480896
Validation loss = 0.5213889479637146
Validation loss = 0.5182042717933655
Validation loss = 0.52047199010849
Validation loss = 0.5242995023727417
Validation loss = 0.5171036720275879
Validation loss = 0.5222204327583313
Validation loss = 0.5184805989265442
Validation loss = 0.5258181095123291
Validation loss = 0.5241186618804932
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5193482041358948
Validation loss = 0.5188448429107666
Validation loss = 0.5202633142471313
Validation loss = 0.5222854018211365
Validation loss = 0.5266945362091064
Validation loss = 0.5277099013328552
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 4
average number of affinization = 3.438451776649746
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.440710209258085
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.4455006337135616
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.4490183660544647
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.4544303797468356
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 4
average number of affinization = 3.454775458570525
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.4576485461441213
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.461781427668983
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.465909090909091
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.4694006309148264
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.4741488020176545
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.4782608695652173
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.4836272040302267
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.487098804279421
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.4918238993710693
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.4952859836580767
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.5006281407035176
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.506591337099812
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.510037641154329
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.5128526645768026
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.5150375939849625
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.517845961177207
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.523153942428035
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.5259537210756724
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 4
average number of affinization = 3.52625
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -48.5    |
| Iteration     | 62       |
| MaximumReturn | -0.632   |
| MinimumReturn | -122     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5172110199928284
Validation loss = 0.5168462991714478
Validation loss = 0.5219903588294983
Validation loss = 0.527744472026825
Validation loss = 0.5248479843139648
Validation loss = 0.5242031216621399
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5119031071662903
Validation loss = 0.5077162384986877
Validation loss = 0.5150572657585144
Validation loss = 0.5103541612625122
Validation loss = 0.5109930634498596
Validation loss = 0.511798620223999
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5213357210159302
Validation loss = 0.5205146670341492
Validation loss = 0.526451051235199
Validation loss = 0.5229842066764832
Validation loss = 0.5214577317237854
Validation loss = 0.5242562294006348
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5168958902359009
Validation loss = 0.5208916068077087
Validation loss = 0.519193172454834
Validation loss = 0.5211066007614136
Validation loss = 0.5236325263977051
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5202933549880981
Validation loss = 0.5193495750427246
Validation loss = 0.5259959697723389
Validation loss = 0.5232542753219604
Validation loss = 0.5258268713951111
Validation loss = 0.5204037427902222
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.5277951280449718
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 3
average number of affinization = 3.527465667915106
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 21
average number of affinization = 3.538365564566438
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.5423940149625937
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.5482866043613708
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.5547945205479454
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.5569383945239577
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.5615671641791047
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.564325668116843
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.5683229813664594
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.5772811918063314
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.582506203473945
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.5852448853068815
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.590458488228005
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.5944272445820435
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.599009900990099
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.608534322820037
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.6137206427688504
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.6158122297714637
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.621604938271605
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.6280074028377545
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.629469790382244
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.636475662353666
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.644088669950739
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.648
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -71      |
| Iteration     | 63       |
| MaximumReturn | -0.243   |
| MinimumReturn | -158     |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5192005038261414
Validation loss = 0.5209707617759705
Validation loss = 0.5215640664100647
Validation loss = 0.5211009979248047
Validation loss = 0.5233108997344971
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.509867787361145
Validation loss = 0.5135031342506409
Validation loss = 0.5142818689346313
Validation loss = 0.512854814529419
Validation loss = 0.5152125358581543
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5188093781471252
Validation loss = 0.5235632658004761
Validation loss = 0.5200676918029785
Validation loss = 0.5234320759773254
Validation loss = 0.5255003571510315
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5156567096710205
Validation loss = 0.517795205116272
Validation loss = 0.5170825719833374
Validation loss = 0.5179354548454285
Validation loss = 0.5188557505607605
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5227838158607483
Validation loss = 0.5193266868591309
Validation loss = 0.5257880091667175
Validation loss = 0.5182954668998718
Validation loss = 0.5220263004302979
Validation loss = 0.5199820399284363
Validation loss = 0.5253540277481079
Validation loss = 0.525159478187561
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 3.6494464944649447
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.650276582667486
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.6554054054054053
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.657458563535912
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.658282208588957
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.665236051502146
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.672794117647059
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.676056338028169
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.6780905752753976
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.686238532110092
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.6907090464547676
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.698839340256567
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.7051282051282053
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.7107992678462476
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.7152439024390245
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.720901889092017
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.725334957369062
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.7328058429701767
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 4
average number of affinization = 3.732968369829684
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.739209726443769
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.7430133657351154
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.7510625379477838
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.7597087378640777
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.7640994542146755
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.769090909090909
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -45.6    |
| Iteration     | 64       |
| MaximumReturn | -0.477   |
| MinimumReturn | -184     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5153709650039673
Validation loss = 0.5130480527877808
Validation loss = 0.515887439250946
Validation loss = 0.5176231265068054
Validation loss = 0.5219350457191467
Validation loss = 0.519430935382843
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.507458508014679
Validation loss = 0.5063807368278503
Validation loss = 0.5093836188316345
Validation loss = 0.5086431503295898
Validation loss = 0.510370671749115
Validation loss = 0.5118660926818848
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5243129134178162
Validation loss = 0.5172818899154663
Validation loss = 0.5199801325798035
Validation loss = 0.5167545676231384
Validation loss = 0.5249679088592529
Validation loss = 0.520648181438446
Validation loss = 0.5197455883026123
Validation loss = 0.5231145620346069
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5160215497016907
Validation loss = 0.5115563869476318
Validation loss = 0.5167837738990784
Validation loss = 0.5162019729614258
Validation loss = 0.5182652473449707
Validation loss = 0.5175650715827942
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5206291675567627
Validation loss = 0.5164520144462585
Validation loss = 0.5214088559150696
Validation loss = 0.5227876305580139
Validation loss = 0.5184152722358704
Validation loss = 0.5231420397758484
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.7716535433070866
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.7736077481840193
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.7743496672716272
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.7793228536880292
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 18
average number of affinization = 3.787915407854985
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.7958937198067635
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.80084490042245
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.804583835946924
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 15
average number of affinization = 3.811332127787824
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.8150602409638554
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.8205900060204696
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.8267148014440435
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.831629585087192
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.834735576923077
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.840840840840841
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.8469387755102042
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.853029394121176
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.857314148681055
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.859796285200719
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.8622754491017965
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 17
average number of affinization = 3.870137642130461
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.876196172248804
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.8810520023909145
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.8853046594982077
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 2
average number of affinization = 3.884179104477612
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -17.6    |
| Iteration     | 65       |
| MaximumReturn | -0.266   |
| MinimumReturn | -104     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.518740713596344
Validation loss = 0.5161939263343811
Validation loss = 0.5147795081138611
Validation loss = 0.5175328850746155
Validation loss = 0.5216646194458008
Validation loss = 0.5217245221138
Validation loss = 0.5210299491882324
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5118597745895386
Validation loss = 0.5054851770401001
Validation loss = 0.5043209195137024
Validation loss = 0.5076248645782471
Validation loss = 0.512141227722168
Validation loss = 0.5129836201667786
Validation loss = 0.5104416608810425
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5168905258178711
Validation loss = 0.5182531476020813
Validation loss = 0.5179929137229919
Validation loss = 0.5234615802764893
Validation loss = 0.5180904269218445
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5166204571723938
Validation loss = 0.5134899616241455
Validation loss = 0.5151063799858093
Validation loss = 0.514125645160675
Validation loss = 0.5129725337028503
Validation loss = 0.5150738954544067
Validation loss = 0.514513373374939
Validation loss = 0.5163772702217102
Validation loss = 0.5193319320678711
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5232467651367188
Validation loss = 0.5170779824256897
Validation loss = 0.5200069546699524
Validation loss = 0.517833948135376
Validation loss = 0.5230427384376526
Validation loss = 0.5197471380233765
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 9
average number of affinization = 3.8872315035799523
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 5
average number of affinization = 3.8878950506857484
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.8927294398092966
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 16
average number of affinization = 3.899940440738535
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.905357142857143
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 20
average number of affinization = 3.9149315883402735
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 7
average number of affinization = 3.9167657550535075
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.919191919191919
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.923396674584323
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.9270029673590505
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.9311981020166074
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.937166567871962
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.941943127962085
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.9443457667258732
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 8
average number of affinization = 3.9467455621301775
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 10
average number of affinization = 3.9503252513305736
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.9562647754137115
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 25
average number of affinization = 3.9686946249261665
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 19
average number of affinization = 3.977567886658796
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 12
average number of affinization = 3.982300884955752
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 13
average number of affinization = 3.987617924528302
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 14
average number of affinization = 3.993517972893341
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 11
average number of affinization = 3.9976442873969376
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.004708652148323
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 5
average number of affinization = 4.0052941176470584
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18.4    |
| Iteration     | 66       |
| MaximumReturn | -0.224   |
| MinimumReturn | -157     |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5198588371276855
Validation loss = 0.5233367681503296
Validation loss = 0.5226789116859436
Validation loss = 0.5204295516014099
Validation loss = 0.5249992609024048
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5179871916770935
Validation loss = 0.5105002522468567
Validation loss = 0.5119308829307556
Validation loss = 0.5086975693702698
Validation loss = 0.5107698440551758
Validation loss = 0.5132073760032654
Validation loss = 0.5137327313423157
Validation loss = 0.517425537109375
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5187585949897766
Validation loss = 0.5198090672492981
Validation loss = 0.5209953188896179
Validation loss = 0.5217239260673523
Validation loss = 0.5189744234085083
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5207089781761169
Validation loss = 0.5142692923545837
Validation loss = 0.5193842053413391
Validation loss = 0.5179631114006042
Validation loss = 0.5213999152183533
Validation loss = 0.5200939178466797
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.521483302116394
Validation loss = 0.5204352736473083
Validation loss = 0.5212410688400269
Validation loss = 0.5209437012672424
Validation loss = 0.518827497959137
Validation loss = 0.5239210724830627
Validation loss = 0.5241477489471436
Validation loss = 0.5246240496635437
Validation loss = 0.5208480954170227
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.0094062316284536
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 22
average number of affinization = 4.019976498237368
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.022900763358779
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 4.02406103286385
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.026979472140763
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.030480656506448
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.033391915641476
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.03864168618267
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.040959625511995
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.046783625730995
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.053185271770894
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.058995327102804
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 27
average number of affinization = 4.0723876240513714
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.077596266044341
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.082798833819242
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.091491841491841
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.098427489807804
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.1030267753201395
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.108784176847004
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 6
average number of affinization = 4.109883720930233
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.113306217315515
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.119628339140534
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.124201973302379
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.129930394431555
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.137391304347826
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10      |
| Iteration     | 67       |
| MaximumReturn | -0.322   |
| MinimumReturn | -128     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.52069491147995
Validation loss = 0.5223399996757507
Validation loss = 0.5240176916122437
Validation loss = 0.5213416814804077
Validation loss = 0.5265812277793884
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5171046257019043
Validation loss = 0.5139679908752441
Validation loss = 0.5181244611740112
Validation loss = 0.5123481750488281
Validation loss = 0.5134811997413635
Validation loss = 0.5207194685935974
Validation loss = 0.5224607586860657
Validation loss = 0.5158208608627319
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5256507992744446
Validation loss = 0.5221912860870361
Validation loss = 0.5227587223052979
Validation loss = 0.5188514590263367
Validation loss = 0.5213456749916077
Validation loss = 0.5207029581069946
Validation loss = 0.5228416323661804
Validation loss = 0.5241038799285889
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5190727114677429
Validation loss = 0.5172753930091858
Validation loss = 0.5226637721061707
Validation loss = 0.5239576101303101
Validation loss = 0.5174264311790466
Validation loss = 0.5180399417877197
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5236912965774536
Validation loss = 0.5249987244606018
Validation loss = 0.526251494884491
Validation loss = 0.5249842405319214
Validation loss = 0.5252452492713928
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.140787949015063
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.148233931673422
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.1498842592592595
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.153846153846154
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.158959537572255
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.1634893125361065
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.166281755196305
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 1
average number of affinization = 4.1644547028274665
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.167243367935409
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.171757925072046
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.179147465437788
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.185377086931491
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.189873417721519
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.196089706728005
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.200574712643678
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.203905801263642
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.208381171067738
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.215146299483649
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.2213302752293576
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.229226361031519
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.2331042382588775
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.235832856325128
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.237986270022883
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.244139508290452
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.250857142857143
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.8    |
| Iteration     | 68       |
| MaximumReturn | -0.437   |
| MinimumReturn | -86.9    |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5193866491317749
Validation loss = 0.5252472162246704
Validation loss = 0.5251120328903198
Validation loss = 0.5213204622268677
Validation loss = 0.5260945558547974
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5154659748077393
Validation loss = 0.5137590169906616
Validation loss = 0.516783595085144
Validation loss = 0.5143634080886841
Validation loss = 0.5188238620758057
Validation loss = 0.5180214643478394
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5216867923736572
Validation loss = 0.5211246609687805
Validation loss = 0.5210627317428589
Validation loss = 0.5260230302810669
Validation loss = 0.5266879796981812
Validation loss = 0.5258327722549438
Validation loss = 0.5260835886001587
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5198194980621338
Validation loss = 0.5228342413902283
Validation loss = 0.520819902420044
Validation loss = 0.5175895094871521
Validation loss = 0.5199500322341919
Validation loss = 0.5224189162254333
Validation loss = 0.520923376083374
Validation loss = 0.524557888507843
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5206777453422546
Validation loss = 0.5190767049789429
Validation loss = 0.5224336981773376
Validation loss = 0.5243062376976013
Validation loss = 0.526160717010498
Validation loss = 0.5226795673370361
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.259280411193604
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.264269406392694
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.270393610952652
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 3
average number of affinization = 4.269669327251996
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 5
average number of affinization = 4.27008547008547
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.276195899772209
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 6
average number of affinization = 4.277177006260672
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.27872582480091
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.281409891984082
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.2846590909090905
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.287904599659284
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.292849035187287
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.2960862166761205
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.304988662131519
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 5
average number of affinization = 4.305382436260623
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.310872027180068
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.314091680814941
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.3184389140271495
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.323911814584511
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.328813559322034
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.336533032185206
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.34255079006772
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.347433728144388
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 5
average number of affinization = 4.3478015783540025
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.354366197183099
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.51    |
| Iteration     | 69       |
| MaximumReturn | -0.268   |
| MinimumReturn | -13.2    |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5250806212425232
Validation loss = 0.5199784636497498
Validation loss = 0.5229442119598389
Validation loss = 0.5242472887039185
Validation loss = 0.5234605669975281
Validation loss = 0.5266297459602356
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5184105038642883
Validation loss = 0.5170409679412842
Validation loss = 0.5195807814598083
Validation loss = 0.5163198709487915
Validation loss = 0.5233193635940552
Validation loss = 0.5200099349021912
Validation loss = 0.519862711429596
Validation loss = 0.5208219289779663
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.526100218296051
Validation loss = 0.5258808732032776
Validation loss = 0.5276699662208557
Validation loss = 0.5257629752159119
Validation loss = 0.5269654393196106
Validation loss = 0.5267839431762695
Validation loss = 0.5276263356208801
Validation loss = 0.5281327962875366
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5221068859100342
Validation loss = 0.5215569138526917
Validation loss = 0.5255190134048462
Validation loss = 0.5250452756881714
Validation loss = 0.5248709917068481
Validation loss = 0.5261369943618774
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.526483952999115
Validation loss = 0.5210385322570801
Validation loss = 0.523748517036438
Validation loss = 0.526848554611206
Validation loss = 0.525697648525238
Validation loss = 0.5257205963134766
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
average number of affinization = 4.351914414414415
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.359032076533484
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.362204724409449
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.368184373243396
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.37247191011236
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.376754632229085
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.3793490460157125
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.383623107122827
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.392937219730942
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.397759103641457
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.400895856662934
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.402350307778399
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.404362416107382
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.410285075461151
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 22
average number of affinization = 4.420111731843575
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.423785594639866
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.430803571428571
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.437255995538204
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.442586399108138
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.446796657381616
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.448775055679287
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 9
average number of affinization = 4.451307735114079
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.456618464961068
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.463590883824347
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.471666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.35    |
| Iteration     | 70       |
| MaximumReturn | -0.305   |
| MinimumReturn | -10.2    |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.525734543800354
Validation loss = 0.5243931412696838
Validation loss = 0.5295955538749695
Validation loss = 0.5279810428619385
Validation loss = 0.5232163667678833
Validation loss = 0.5267727971076965
Validation loss = 0.5310286283493042
Validation loss = 0.5291584134101868
Validation loss = 0.5287695527076721
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5241669416427612
Validation loss = 0.5232996344566345
Validation loss = 0.519416868686676
Validation loss = 0.5235958099365234
Validation loss = 0.5179744958877563
Validation loss = 0.5217949151992798
Validation loss = 0.5238268971443176
Validation loss = 0.5229974389076233
Validation loss = 0.5267433524131775
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.526669979095459
Validation loss = 0.5273533463478088
Validation loss = 0.526789128780365
Validation loss = 0.5292655825614929
Validation loss = 0.5312644243240356
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5252863764762878
Validation loss = 0.528049647808075
Validation loss = 0.5236387848854065
Validation loss = 0.5287441611289978
Validation loss = 0.5305777192115784
Validation loss = 0.5322874784469604
Validation loss = 0.5263053178787231
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5283775925636292
Validation loss = 0.533740758895874
Validation loss = 0.527421236038208
Validation loss = 0.5270344614982605
Validation loss = 0.5236341953277588
Validation loss = 0.5257502794265747
Validation loss = 0.5291774272918701
Validation loss = 0.5292627811431885
Validation loss = 0.5282177925109863
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.478067740144364
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.485571587125416
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.490848585690515
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.498891352549889
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.501939058171745
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.505537098560354
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.511898173768677
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.514933628318584
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.516307352128248
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.522651933701657
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.524019878520154
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 5
average number of affinization = 4.524282560706402
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.533370104798676
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.539691289966924
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.5438016528925615
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.549008810572687
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 6
average number of affinization = 4.549807374793616
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 6
average number of affinization = 4.550605060506051
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.556349642660803
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.562637362637362
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.56397583745195
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.568057080131723
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.574328030718596
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.578399122807017
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.586301369863014
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.81    |
| Iteration     | 71       |
| MaximumReturn | -0.309   |
| MinimumReturn | -23.5    |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5278204083442688
Validation loss = 0.5255038738250732
Validation loss = 0.5276719331741333
Validation loss = 0.5288845300674438
Validation loss = 0.5317574143409729
Validation loss = 0.5310783982276917
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5272711515426636
Validation loss = 0.5240962505340576
Validation loss = 0.5221391916275024
Validation loss = 0.519826352596283
Validation loss = 0.5221572518348694
Validation loss = 0.52524334192276
Validation loss = 0.5234609842300415
Validation loss = 0.526715874671936
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.528529942035675
Validation loss = 0.5282627940177917
Validation loss = 0.5315477848052979
Validation loss = 0.5352031588554382
Validation loss = 0.5313975214958191
Validation loss = 0.5311516523361206
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5248441100120544
Validation loss = 0.5242030620574951
Validation loss = 0.5254085659980774
Validation loss = 0.5299291014671326
Validation loss = 0.5249731540679932
Validation loss = 0.5320087671279907
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.529829204082489
Validation loss = 0.528495192527771
Validation loss = 0.5258374810218811
Validation loss = 0.5267599821090698
Validation loss = 0.527937650680542
Validation loss = 0.5277260541915894
Validation loss = 0.5319668054580688
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.590909090909091
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.597153804050356
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.602297592997812
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.605795516675779
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.614207650273224
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.615510649918077
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.61735807860262
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 4
average number of affinization = 4.617021276595745
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.621046892039258
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.6277929155313355
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.636165577342048
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.6418072945019055
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.643090315560392
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 24
average number of affinization = 4.653616095704187
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.659239130434782
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.666485605649104
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 2
average number of affinization = 4.665038002171553
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 23
average number of affinization = 4.674986435160065
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 21
average number of affinization = 4.683839479392625
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 22
average number of affinization = 4.693224932249323
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.697183098591549
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.700054141851651
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.707251082251083
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.713899405083829
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.718378378378379
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.7    |
| Iteration     | 72       |
| MaximumReturn | -0.3     |
| MinimumReturn | -172     |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5290411114692688
Validation loss = 0.5293187499046326
Validation loss = 0.5295308828353882
Validation loss = 0.5274582505226135
Validation loss = 0.5305619835853577
Validation loss = 0.5323814749717712
Validation loss = 0.532724142074585
Validation loss = 0.5345454812049866
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5245986580848694
Validation loss = 0.5195583701133728
Validation loss = 0.5233387351036072
Validation loss = 0.5186814069747925
Validation loss = 0.5245115160942078
Validation loss = 0.5212075114250183
Validation loss = 0.526421844959259
Validation loss = 0.5238270163536072
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5287829041481018
Validation loss = 0.5284698605537415
Validation loss = 0.5276167988777161
Validation loss = 0.5301169157028198
Validation loss = 0.5321142077445984
Validation loss = 0.5310244560241699
Validation loss = 0.529499888420105
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5262019038200378
Validation loss = 0.5244410634040833
Validation loss = 0.5286104679107666
Validation loss = 0.5284903645515442
Validation loss = 0.5289756059646606
Validation loss = 0.5247218608856201
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5255067944526672
Validation loss = 0.5299469232559204
Validation loss = 0.5300114750862122
Validation loss = 0.5337318778038025
Validation loss = 0.529528021812439
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.721772015126958
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.724622030237581
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 7
average number of affinization = 4.72584997301673
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.727615965480044
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.732614555256065
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.737068965517241
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.740980075390414
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.747578040904198
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.7514792899408285
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 24
average number of affinization = 4.761827956989247
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.764642665233746
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.771213748657358
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.776704240472356
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 6
average number of affinization = 4.777360515021459
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 2
average number of affinization = 4.775871313672922
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 18
average number of affinization = 4.782958199356913
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.791108730583824
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.796573875802998
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.801498127340824
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.804812834224599
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.809192944949225
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.81517094017094
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.818473037907101
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 22
average number of affinization = 4.827641408751334
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.832533333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.65    |
| Iteration     | 73       |
| MaximumReturn | -0.338   |
| MinimumReturn | -139     |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5317102670669556
Validation loss = 0.5301856994628906
Validation loss = 0.5317866802215576
Validation loss = 0.5336167216300964
Validation loss = 0.5332421064376831
Validation loss = 0.531667172908783
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5254351496696472
Validation loss = 0.5275798439979553
Validation loss = 0.5306907296180725
Validation loss = 0.5245774984359741
Validation loss = 0.5241886377334595
Validation loss = 0.5242955684661865
Validation loss = 0.526532769203186
Validation loss = 0.527561604976654
Validation loss = 0.5283540487289429
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5318453907966614
Validation loss = 0.530358612537384
Validation loss = 0.532577633857727
Validation loss = 0.53052818775177
Validation loss = 0.5334832668304443
Validation loss = 0.5390226244926453
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5277805924415588
Validation loss = 0.5266683101654053
Validation loss = 0.5317922830581665
Validation loss = 0.5265262722969055
Validation loss = 0.5304692983627319
Validation loss = 0.5297040343284607
Validation loss = 0.5315300226211548
Validation loss = 0.5319939851760864
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5280545949935913
Validation loss = 0.5293238759040833
Validation loss = 0.5288957953453064
Validation loss = 0.528891384601593
Validation loss = 0.5280002951622009
Validation loss = 0.5304930210113525
Validation loss = 0.5302611589431763
Validation loss = 0.5317344665527344
Validation loss = 0.5286887288093567
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.834221748400853
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.840170484816196
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.84664536741214
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 6
average number of affinization = 4.847259180415114
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.851595744680851
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.8559276980329615
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.860255047821466
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.863515666489644
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.868365180467091
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 19
average number of affinization = 4.875862068965517
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.8806998939554616
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.886062533121357
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.891419491525424
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 20
average number of affinization = 4.899417681312864
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.904232804232804
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.909042834479112
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.912790697674419
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.916006339144215
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.921330517423442
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.927176781002639
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 8
average number of affinization = 4.92879746835443
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.9330521876647335
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.938883034773446
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.942601369141654
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.947894736842105
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.9    |
| Iteration     | 74       |
| MaximumReturn | -0.274   |
| MinimumReturn | -130     |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5336795449256897
Validation loss = 0.5315460562705994
Validation loss = 0.5364511013031006
Validation loss = 0.5347672700881958
Validation loss = 0.5352643728256226
Validation loss = 0.5353711247444153
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5275470614433289
Validation loss = 0.523864209651947
Validation loss = 0.5285583138465881
Validation loss = 0.5277851223945618
Validation loss = 0.5250471830368042
Validation loss = 0.5254314541816711
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5308143496513367
Validation loss = 0.5289299488067627
Validation loss = 0.5258314609527588
Validation loss = 0.529201865196228
Validation loss = 0.5361418128013611
Validation loss = 0.5297932028770447
Validation loss = 0.5348251461982727
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5285043120384216
Validation loss = 0.5250858664512634
Validation loss = 0.5305824875831604
Validation loss = 0.5283816456794739
Validation loss = 0.5308302640914917
Validation loss = 0.5339120030403137
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5328211188316345
Validation loss = 0.5292112827301025
Validation loss = 0.5307350158691406
Validation loss = 0.5287143588066101
Validation loss = 0.5325033664703369
Validation loss = 0.532943069934845
Validation loss = 0.5316711664199829
Validation loss = 0.5322338342666626
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.951604418726986
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 15
average number of affinization = 4.956887486855941
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 12
average number of affinization = 4.960588544403573
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 11
average number of affinization = 4.963760504201681
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.966404199475066
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 13
average number of affinization = 4.970619097586569
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 17
average number of affinization = 4.976927110644992
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 14
average number of affinization = 4.981656184486373
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.9874279727606075
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 16
average number of affinization = 4.993193717277487
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 10
average number of affinization = 4.995813710099425
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.0052301255230125
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 8
average number of affinization = 5.006795608991114
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.011494252873563
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.016710182767624
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 9
average number of affinization = 5.018789144050104
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.023995826812728
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.029718456725756
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.036998436685773
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 8
average number of affinization = 5.038541666666666
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 6
average number of affinization = 5.039042165538782
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.04474505723205
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.049401976079043
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 8
average number of affinization = 5.050935550935551
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.06025974025974
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.2    |
| Iteration     | 75       |
| MaximumReturn | -0.352   |
| MinimumReturn | -139     |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5398374199867249
Validation loss = 0.5347790718078613
Validation loss = 0.5409257411956787
Validation loss = 0.5315303206443787
Validation loss = 0.5375780463218689
Validation loss = 0.5340105295181274
Validation loss = 0.5351652503013611
Validation loss = 0.5351961851119995
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5253134369850159
Validation loss = 0.5249561667442322
Validation loss = 0.5256096124649048
Validation loss = 0.5259686708450317
Validation loss = 0.5304172039031982
Validation loss = 0.5292162895202637
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5323934555053711
Validation loss = 0.534201979637146
Validation loss = 0.5292373895645142
Validation loss = 0.5346851944923401
Validation loss = 0.5346018671989441
Validation loss = 0.5407277345657349
Validation loss = 0.5346450805664062
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5357583165168762
Validation loss = 0.5284965634346008
Validation loss = 0.532301664352417
Validation loss = 0.5314401984214783
Validation loss = 0.5333067178726196
Validation loss = 0.5354613065719604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5355509519577026
Validation loss = 0.5310794115066528
Validation loss = 0.5315477252006531
Validation loss = 0.531933069229126
Validation loss = 0.5320706367492676
Validation loss = 0.5326786041259766
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.069055036344756
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.077841203943954
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 25
average number of affinization = 5.088174273858921
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 26
average number of affinization = 5.099015033696215
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.104663212435233
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.108751941998964
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.115942028985507
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 8
average number of affinization = 5.117434040351784
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.127197518097208
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.133333333333334
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.1384297520661155
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.144037170882808
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.148606811145511
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.157297576070139
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.165979381443299
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 25
average number of affinization = 5.176197836166924
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 12
average number of affinization = 5.179711637487126
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.1868244981986615
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.191872427983539
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.200514138817481
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.209146968139774
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.215202876219825
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.224332648870637
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.231400718317086
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.236923076923077
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.734   |
| Iteration     | 76       |
| MaximumReturn | -0.124   |
| MinimumReturn | -4.03    |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5365732908248901
Validation loss = 0.5380946397781372
Validation loss = 0.5338298678398132
Validation loss = 0.5391961932182312
Validation loss = 0.5376819372177124
Validation loss = 0.5369670391082764
Validation loss = 0.5371137261390686
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5306174755096436
Validation loss = 0.5293300151824951
Validation loss = 0.5260202288627625
Validation loss = 0.5312301516532898
Validation loss = 0.5292212963104248
Validation loss = 0.5283139944076538
Validation loss = 0.5272653698921204
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.535678505897522
Validation loss = 0.5361620783805847
Validation loss = 0.534615695476532
Validation loss = 0.5346990823745728
Validation loss = 0.5378966331481934
Validation loss = 0.536515474319458
Validation loss = 0.5367133021354675
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.535159707069397
Validation loss = 0.5288007259368896
Validation loss = 0.5366196632385254
Validation loss = 0.5303228497505188
Validation loss = 0.5333946943283081
Validation loss = 0.5353915095329285
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5339887738227844
Validation loss = 0.5322385430335999
Validation loss = 0.533294141292572
Validation loss = 0.5332176685333252
Validation loss = 0.5350350737571716
Validation loss = 0.5331243276596069
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 7
average number of affinization = 5.237826755509995
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.24077868852459
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.247311827956989
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.254350051177073
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.263938618925831
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.270449897750511
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 18
average number of affinization = 5.2769545222279
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.279877425944842
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 8
average number of affinization = 5.2812659520163345
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.286224489795918
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.290158082610913
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 21
average number of affinization = 5.298165137614679
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 5.3015792154865
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.307535641547862
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.312468193384224
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 25
average number of affinization = 5.322482197355035
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 25
average number of affinization = 5.332486019318759
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.337398373983739
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.346368715083799
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 10
average number of affinization = 5.348730964467005
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.354134956874683
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 21
average number of affinization = 5.362068965517241
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.367460719716168
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.37436676798379
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.37873417721519
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -61      |
| Iteration     | 77       |
| MaximumReturn | -0.205   |
| MinimumReturn | -120     |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5367468595504761
Validation loss = 0.5335330367088318
Validation loss = 0.5356578230857849
Validation loss = 0.5367400050163269
Validation loss = 0.5338900685310364
Validation loss = 0.535676121711731
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5268347859382629
Validation loss = 0.5237462520599365
Validation loss = 0.526228129863739
Validation loss = 0.5286781787872314
Validation loss = 0.5245458483695984
Validation loss = 0.5240320563316345
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5355415940284729
Validation loss = 0.5384678244590759
Validation loss = 0.5348259210586548
Validation loss = 0.5340905785560608
Validation loss = 0.5334791541099548
Validation loss = 0.5369768142700195
Validation loss = 0.543285608291626
Validation loss = 0.5357400178909302
Validation loss = 0.536509096622467
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5328022241592407
Validation loss = 0.5293264389038086
Validation loss = 0.5308758616447449
Validation loss = 0.5341058969497681
Validation loss = 0.5372354388237
Validation loss = 0.5320493578910828
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5274880528450012
Validation loss = 0.5289517045021057
Validation loss = 0.5305352210998535
Validation loss = 0.5293943881988525
Validation loss = 0.5319481492042542
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.383603238866397
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 12
average number of affinization = 5.386949924127466
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.391304347826087
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 21
average number of affinization = 5.399191510864073
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 30
average number of affinization = 5.411616161616162
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.41696113074205
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.4238143289606455
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.43267776096823
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 12
average number of affinization = 5.435987903225806
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.44433249370277
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.449144008056395
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.453950679416205
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 12
average number of affinization = 5.457243460764587
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.460030165912519
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 35
average number of affinization = 5.474874371859296
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.480662983425415
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.485441767068273
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.490215755143001
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.495987963891675
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 17
average number of affinization = 5.5017543859649125
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.511022044088176
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.514772158237356
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.519019019019019
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 22
average number of affinization = 5.527263631815908
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.534
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -40.5    |
| Iteration     | 78       |
| MaximumReturn | -0.293   |
| MinimumReturn | -104     |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5370514988899231
Validation loss = 0.5306562185287476
Validation loss = 0.5351319909095764
Validation loss = 0.5339784622192383
Validation loss = 0.5317856669425964
Validation loss = 0.5342913866043091
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5247400999069214
Validation loss = 0.5249090194702148
Validation loss = 0.5256361961364746
Validation loss = 0.5263199210166931
Validation loss = 0.5282220840454102
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5324627161026001
Validation loss = 0.5301324725151062
Validation loss = 0.5342060327529907
Validation loss = 0.5370000600814819
Validation loss = 0.5362856984138489
Validation loss = 0.5364447236061096
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5292873978614807
Validation loss = 0.5270230770111084
Validation loss = 0.526665985584259
Validation loss = 0.5318450927734375
Validation loss = 0.5261936783790588
Validation loss = 0.5335220098495483
Validation loss = 0.5321632027626038
Validation loss = 0.53330397605896
Validation loss = 0.5343645811080933
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5287343859672546
Validation loss = 0.5310038924217224
Validation loss = 0.5275886654853821
Validation loss = 0.5291318893432617
Validation loss = 0.5303938388824463
Validation loss = 0.530179500579834
Validation loss = 0.529869556427002
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 6
average number of affinization = 5.534232883558221
Path 1 | total_timesteps 100.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.540959040959041
Path 2 | total_timesteps 200.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.5496754867698455
Path 3 | total_timesteps 300.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.55439121756487
Path 4 | total_timesteps 400.
number of affinization with epsilon = 3 is 21
average number of affinization = 5.5620947630922695
Path 5 | total_timesteps 500.
number of affinization with epsilon = 3 is 24
average number of affinization = 5.571286141575274
Path 6 | total_timesteps 600.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.576482311908321
Path 7 | total_timesteps 700.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.581175298804781
Path 8 | total_timesteps 800.
number of affinization with epsilon = 3 is 13
average number of affinization = 5.584868093578895
Path 9 | total_timesteps 900.
number of affinization with epsilon = 3 is 11
average number of affinization = 5.587562189054727
Path 10 | total_timesteps 1000.
number of affinization with epsilon = 3 is 20
average number of affinization = 5.594728990551964
Path 11 | total_timesteps 1100.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.599403578528827
Path 12 | total_timesteps 1200.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.604570293094883
Path 13 | total_timesteps 1300.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.6092353525322745
Path 14 | total_timesteps 1400.
number of affinization with epsilon = 3 is 19
average number of affinization = 5.615880893300248
Path 15 | total_timesteps 1500.
number of affinization with epsilon = 3 is 26
average number of affinization = 5.625992063492063
Path 16 | total_timesteps 1600.
number of affinization with epsilon = 3 is 27
average number of affinization = 5.636588993554784
Path 17 | total_timesteps 1700.
number of affinization with epsilon = 3 is 7
average number of affinization = 5.637264618434093
Path 18 | total_timesteps 1800.
number of affinization with epsilon = 3 is 21
average number of affinization = 5.644873699851412
Path 19 | total_timesteps 1900.
number of affinization with epsilon = 3 is 15
average number of affinization = 5.649504950495049
Path 20 | total_timesteps 2000.
number of affinization with epsilon = 3 is 21
average number of affinization = 5.657100445324097
Path 21 | total_timesteps 2100.
number of affinization with epsilon = 3 is 27
average number of affinization = 5.667655786350148
Path 22 | total_timesteps 2200.
number of affinization with epsilon = 3 is 23
average number of affinization = 5.67622343054869
Path 23 | total_timesteps 2300.
number of affinization with epsilon = 3 is 14
average number of affinization = 5.680335968379446
Path 24 | total_timesteps 2400.
number of affinization with epsilon = 3 is 16
average number of affinization = 5.685432098765432
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -67      |
| Iteration     | 79       |
| MaximumReturn | -0.428   |
| MinimumReturn | -146     |
| TotalSamples  | 134946   |
----------------------------
