Logging to experiments/gym_fswimmer/nov4/Sw350e1_seed5543
Print configuration .....
{'env_name': 'gym_fswimmer', 'random_seeds': [2312, 1231, 2631, 5543], 'save_variables': False, 'model_save_dir': '/tmp/gym_fswimmer_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 200, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 0
average number of affinization = 0.0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.377564013004303
Validation loss = 0.1942063570022583
Validation loss = 0.1400386542081833
Validation loss = 0.11728493124246597
Validation loss = 0.08995875716209412
Validation loss = 0.0854884684085846
Validation loss = 0.08090965449810028
Validation loss = 0.09008098393678665
Validation loss = 0.08660754561424255
Validation loss = 0.07195478677749634
Validation loss = 0.06859487295150757
Validation loss = 0.06838314235210419
Validation loss = 0.07162677496671677
Validation loss = 0.07414309680461884
Validation loss = 0.0736241489648819
Validation loss = 0.07082094252109528
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5002065300941467
Validation loss = 0.19035077095031738
Validation loss = 0.13845618069171906
Validation loss = 0.10671116411685944
Validation loss = 0.09412640333175659
Validation loss = 0.08406044542789459
Validation loss = 0.08145885169506073
Validation loss = 0.09551049768924713
Validation loss = 0.08820924162864685
Validation loss = 0.08090279996395111
Validation loss = 0.08490465581417084
Validation loss = 0.07424730062484741
Validation loss = 0.07645656913518906
Validation loss = 0.07895157486200333
Validation loss = 0.07172603160142899
Validation loss = 0.07991056889295578
Validation loss = 0.07226239889860153
Validation loss = 0.06949391961097717
Validation loss = 0.07040297985076904
Validation loss = 0.07232925295829773
Validation loss = 0.07647766172885895
Validation loss = 0.08800733089447021
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4423366189002991
Validation loss = 0.20048227906227112
Validation loss = 0.13842636346817017
Validation loss = 0.10622354596853256
Validation loss = 0.09464219957590103
Validation loss = 0.08491236716508865
Validation loss = 0.08238715678453445
Validation loss = 0.07541941851377487
Validation loss = 0.07269468903541565
Validation loss = 0.07733716070652008
Validation loss = 0.0733782947063446
Validation loss = 0.08215528726577759
Validation loss = 0.0727008655667305
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3666090965270996
Validation loss = 0.19372764229774475
Validation loss = 0.12856389582157135
Validation loss = 0.09916049242019653
Validation loss = 0.09287820756435394
Validation loss = 0.08250659704208374
Validation loss = 0.11005419492721558
Validation loss = 0.07922983169555664
Validation loss = 0.0841686874628067
Validation loss = 0.07013463228940964
Validation loss = 0.07698660343885422
Validation loss = 0.0742851197719574
Validation loss = 0.07777539640665054
Validation loss = 0.0700594037771225
Validation loss = 0.07641901820898056
Validation loss = 0.07778169214725494
Validation loss = 0.07746990025043488
Validation loss = 0.06864286214113235
Validation loss = 0.06482043862342834
Validation loss = 0.07723306864500046
Validation loss = 0.0707823634147644
Validation loss = 0.0667063519358635
Validation loss = 0.06426190584897995
Validation loss = 0.070081427693367
Validation loss = 0.07062594592571259
Validation loss = 0.06288796663284302
Validation loss = 0.06594429910182953
Validation loss = 0.06448561698198318
Validation loss = 0.07124900072813034
Validation loss = 0.07485498487949371
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4073832333087921
Validation loss = 0.1959945559501648
Validation loss = 0.13442310690879822
Validation loss = 0.10511758923530579
Validation loss = 0.09215664863586426
Validation loss = 0.08151200413703918
Validation loss = 0.09299127757549286
Validation loss = 0.07932360470294952
Validation loss = 0.075013168156147
Validation loss = 0.07275181263685226
Validation loss = 0.08635513484477997
Validation loss = 0.07017314434051514
Validation loss = 0.069400854408741
Validation loss = 0.07070214301347733
Validation loss = 0.06785678118467331
Validation loss = 0.06620213389396667
Validation loss = 0.07084929198026657
Validation loss = 0.0647309273481369
Validation loss = 0.0661151111125946
Validation loss = 0.07018570601940155
Validation loss = 0.07191572338342667
Validation loss = 0.07182016968727112
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 17
average number of affinization = 2.4285714285714284
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 13
average number of affinization = 3.75
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 6
average number of affinization = 4.0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 11
average number of affinization = 4.7
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 9
average number of affinization = 5.090909090909091
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 29
average number of affinization = 7.083333333333333
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 39.9     |
| Iteration     | 0        |
| MaximumReturn | 46.8     |
| MinimumReturn | 34.7     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05873475968837738
Validation loss = 0.030218176543712616
Validation loss = 0.02909168414771557
Validation loss = 0.027312172576785088
Validation loss = 0.02920570969581604
Validation loss = 0.028460389003157616
Validation loss = 0.02784092351794243
Validation loss = 0.024769097566604614
Validation loss = 0.027375752106308937
Validation loss = 0.02946547046303749
Validation loss = 0.02790617197751999
Validation loss = 0.027095293626189232
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.054944299161434174
Validation loss = 0.03205271065235138
Validation loss = 0.028323020786046982
Validation loss = 0.027925560250878334
Validation loss = 0.025569474324584007
Validation loss = 0.02588822692632675
Validation loss = 0.024320973083376884
Validation loss = 0.02508779615163803
Validation loss = 0.02622031420469284
Validation loss = 0.027316927909851074
Validation loss = 0.02551257610321045
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05830535292625427
Validation loss = 0.03011193871498108
Validation loss = 0.028791988268494606
Validation loss = 0.027619868516921997
Validation loss = 0.0262745451182127
Validation loss = 0.025473175570368767
Validation loss = 0.025852929800748825
Validation loss = 0.026668382808566093
Validation loss = 0.02775139920413494
Validation loss = 0.02526303566992283
Validation loss = 0.024742553010582924
Validation loss = 0.02595304511487484
Validation loss = 0.025291357189416885
Validation loss = 0.025244075804948807
Validation loss = 0.02517387457191944
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06135295704007149
Validation loss = 0.030441761016845703
Validation loss = 0.02703302726149559
Validation loss = 0.025705497711896896
Validation loss = 0.02502545528113842
Validation loss = 0.023849988356232643
Validation loss = 0.024806341156363487
Validation loss = 0.025542672723531723
Validation loss = 0.027364032343029976
Validation loss = 0.026036029681563377
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06112537533044815
Validation loss = 0.02806166000664234
Validation loss = 0.02729266695678234
Validation loss = 0.026890629902482033
Validation loss = 0.027750864624977112
Validation loss = 0.02420942485332489
Validation loss = 0.026944272220134735
Validation loss = 0.024313082918524742
Validation loss = 0.024910487234592438
Validation loss = 0.026855962350964546
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 71
average number of affinization = 12.0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 82
average number of affinization = 17.0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 68
average number of affinization = 20.4
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 63
average number of affinization = 23.0625
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 70
average number of affinization = 25.823529411764707
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 9
average number of affinization = 24.88888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 64.9     |
| Iteration     | 1        |
| MaximumReturn | 70.2     |
| MinimumReturn | 59.6     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023827483877539635
Validation loss = 0.014133420772850513
Validation loss = 0.012856935150921345
Validation loss = 0.013422363437712193
Validation loss = 0.015202430076897144
Validation loss = 0.013331974856555462
Validation loss = 0.013994093053042889
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.024748608469963074
Validation loss = 0.013737271539866924
Validation loss = 0.01274810079485178
Validation loss = 0.014724843204021454
Validation loss = 0.012789924629032612
Validation loss = 0.01271409448236227
Validation loss = 0.012815284542739391
Validation loss = 0.01165864709764719
Validation loss = 0.014501302503049374
Validation loss = 0.017926745116710663
Validation loss = 0.012473203241825104
Validation loss = 0.013781115412712097
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021618271246552467
Validation loss = 0.012611462734639645
Validation loss = 0.013135007582604885
Validation loss = 0.01249378826469183
Validation loss = 0.011934579350054264
Validation loss = 0.01333941426128149
Validation loss = 0.014177686534821987
Validation loss = 0.011818618513643742
Validation loss = 0.012830515392124653
Validation loss = 0.013310045003890991
Validation loss = 0.018611740320920944
Validation loss = 0.012010634876787663
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021965818479657173
Validation loss = 0.013379420153796673
Validation loss = 0.014228166081011295
Validation loss = 0.01256632898002863
Validation loss = 0.014953863807022572
Validation loss = 0.012882235459983349
Validation loss = 0.011881742626428604
Validation loss = 0.017751073464751244
Validation loss = 0.012243061326444149
Validation loss = 0.012818536721169949
Validation loss = 0.012103304266929626
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018675491213798523
Validation loss = 0.013613365590572357
Validation loss = 0.012900379486382008
Validation loss = 0.011988066136837006
Validation loss = 0.013013891875743866
Validation loss = 0.01423819363117218
Validation loss = 0.012553664855659008
Validation loss = 0.013024074025452137
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 144
average number of affinization = 31.157894736842106
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 82
average number of affinization = 33.7
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 94
average number of affinization = 36.57142857142857
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 145
average number of affinization = 41.5
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 134
average number of affinization = 45.52173913043478
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 84
average number of affinization = 47.125
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 42       |
| Iteration     | 2        |
| MaximumReturn | 52.3     |
| MinimumReturn | 35.4     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01257188618183136
Validation loss = 0.007925632409751415
Validation loss = 0.008133737370371819
Validation loss = 0.009678429923951626
Validation loss = 0.008452188223600388
Validation loss = 0.00799474399536848
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011187713593244553
Validation loss = 0.00714659271761775
Validation loss = 0.009359457530081272
Validation loss = 0.006811460480093956
Validation loss = 0.008329030126333237
Validation loss = 0.0075178430415689945
Validation loss = 0.007565437350422144
Validation loss = 0.007151242811232805
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012114493176341057
Validation loss = 0.00744217075407505
Validation loss = 0.007813291624188423
Validation loss = 0.008102353662252426
Validation loss = 0.0074823880568146706
Validation loss = 0.007464727386832237
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01129770465195179
Validation loss = 0.009077372029423714
Validation loss = 0.007277206983417273
Validation loss = 0.007617642171680927
Validation loss = 0.009185614995658398
Validation loss = 0.008298113942146301
Validation loss = 0.008097324520349503
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01170585211366415
Validation loss = 0.007503419183194637
Validation loss = 0.007300030440092087
Validation loss = 0.007747483905404806
Validation loss = 0.007476797327399254
Validation loss = 0.0071281809359788895
Validation loss = 0.007695404812693596
Validation loss = 0.007104701828211546
Validation loss = 0.008844191208481789
Validation loss = 0.00775781087577343
Validation loss = 0.008617870509624481
Validation loss = 0.00852818414568901
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 39
average number of affinization = 46.8
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 59
average number of affinization = 47.26923076923077
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 30
average number of affinization = 46.629629629629626
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 6
average number of affinization = 45.17857142857143
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 21
average number of affinization = 44.3448275862069
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 34
average number of affinization = 44.0
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 129      |
| Iteration     | 3        |
| MaximumReturn | 138      |
| MinimumReturn | 118      |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007238701917231083
Validation loss = 0.006463329307734966
Validation loss = 0.008221003226935863
Validation loss = 0.00686230231076479
Validation loss = 0.006315880920737982
Validation loss = 0.006842020899057388
Validation loss = 0.007008803077042103
Validation loss = 0.006926834583282471
Validation loss = 0.00890623964369297
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006813636980950832
Validation loss = 0.0059987641870975494
Validation loss = 0.006696752272546291
Validation loss = 0.0065552303567528725
Validation loss = 0.0069706798531115055
Validation loss = 0.006620921194553375
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007657982409000397
Validation loss = 0.006867566611617804
Validation loss = 0.006393306888639927
Validation loss = 0.007903236895799637
Validation loss = 0.007052500732243061
Validation loss = 0.006623728666454554
Validation loss = 0.006811055354773998
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008215099573135376
Validation loss = 0.009329469874501228
Validation loss = 0.007435206323862076
Validation loss = 0.008506164886057377
Validation loss = 0.00623449869453907
Validation loss = 0.007669476326555014
Validation loss = 0.007539092097431421
Validation loss = 0.007149816490709782
Validation loss = 0.006368277128785849
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007271836046129465
Validation loss = 0.0067819408141076565
Validation loss = 0.006554184947162867
Validation loss = 0.007380614522844553
Validation loss = 0.006105207838118076
Validation loss = 0.006490910891443491
Validation loss = 0.0068193646147847176
Validation loss = 0.0069543286226689816
Validation loss = 0.007299799472093582
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 2
average number of affinization = 42.645161290322584
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 2
average number of affinization = 41.375
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 2
average number of affinization = 40.18181818181818
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 2
average number of affinization = 39.05882352941177
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 4
average number of affinization = 38.05714285714286
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 2
average number of affinization = 37.05555555555556
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 191      |
| Iteration     | 4        |
| MaximumReturn | 198      |
| MinimumReturn | 184      |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007207311224192381
Validation loss = 0.00667085126042366
Validation loss = 0.005970032885670662
Validation loss = 0.0061559006571769714
Validation loss = 0.00557328388094902
Validation loss = 0.005509355571120977
Validation loss = 0.005835386458784342
Validation loss = 0.00658697122707963
Validation loss = 0.006876341532915831
Validation loss = 0.006182041484862566
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006176780443638563
Validation loss = 0.0058164577931165695
Validation loss = 0.005674581974744797
Validation loss = 0.0057930718176066875
Validation loss = 0.006075028330087662
Validation loss = 0.0057202414609491825
Validation loss = 0.0056596496142446995
Validation loss = 0.006838982924818993
Validation loss = 0.005844596773386002
Validation loss = 0.008176627568900585
Validation loss = 0.005868466105312109
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006699443329125643
Validation loss = 0.0060072713531553745
Validation loss = 0.00622836546972394
Validation loss = 0.007170341443270445
Validation loss = 0.0066627622582018375
Validation loss = 0.005961939692497253
Validation loss = 0.005602640565484762
Validation loss = 0.005622785072773695
Validation loss = 0.006316326558589935
Validation loss = 0.005703052040189505
Validation loss = 0.005548663437366486
Validation loss = 0.007502405438572168
Validation loss = 0.006173580884933472
Validation loss = 0.006171517539769411
Validation loss = 0.007032059598714113
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006638089660555124
Validation loss = 0.005766257178038359
Validation loss = 0.005864910315722227
Validation loss = 0.007173777092248201
Validation loss = 0.006366778165102005
Validation loss = 0.006686225533485413
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006739295553416014
Validation loss = 0.007591011002659798
Validation loss = 0.006080355495214462
Validation loss = 0.005647099111229181
Validation loss = 0.006014385726302862
Validation loss = 0.005846580024808645
Validation loss = 0.005508833099156618
Validation loss = 0.006159513723105192
Validation loss = 0.006283678580075502
Validation loss = 0.005925917997956276
Validation loss = 0.006922071799635887
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 6
average number of affinization = 36.21621621621622
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 2
average number of affinization = 35.31578947368421
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 13
average number of affinization = 34.743589743589745
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 4
average number of affinization = 33.975
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 6
average number of affinization = 33.292682926829265
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 4
average number of affinization = 32.595238095238095
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 216      |
| Iteration     | 5        |
| MaximumReturn | 218      |
| MinimumReturn | 211      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006473652087152004
Validation loss = 0.006491189356893301
Validation loss = 0.005750530865043402
Validation loss = 0.0053240107372403145
Validation loss = 0.005960988812148571
Validation loss = 0.006358935032039881
Validation loss = 0.005572786089032888
Validation loss = 0.0053980378434062
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006115048658102751
Validation loss = 0.005538469646126032
Validation loss = 0.005978957284241915
Validation loss = 0.0060059367679059505
Validation loss = 0.0059027899987995625
Validation loss = 0.005620536860078573
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006098498124629259
Validation loss = 0.0065444218926131725
Validation loss = 0.006091757211834192
Validation loss = 0.006075619719922543
Validation loss = 0.005672994069755077
Validation loss = 0.005848466418683529
Validation loss = 0.005314819049090147
Validation loss = 0.0055254241451621056
Validation loss = 0.006022869609296322
Validation loss = 0.007097114343196154
Validation loss = 0.005940756760537624
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00629551662132144
Validation loss = 0.006739758420735598
Validation loss = 0.006356643047183752
Validation loss = 0.006863697431981564
Validation loss = 0.005409587640315294
Validation loss = 0.007027808576822281
Validation loss = 0.006001561414450407
Validation loss = 0.006358936429023743
Validation loss = 0.007194604258984327
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006794358603656292
Validation loss = 0.005534396041184664
Validation loss = 0.0061007956974208355
Validation loss = 0.005894248373806477
Validation loss = 0.0065527730621397495
Validation loss = 0.00582681642845273
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 4
average number of affinization = 31.930232558139537
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 7
average number of affinization = 31.363636363636363
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 4
average number of affinization = 30.755555555555556
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 4
average number of affinization = 30.17391304347826
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 7
average number of affinization = 29.680851063829788
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 15
average number of affinization = 29.375
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 250      |
| Iteration     | 6        |
| MaximumReturn | 253      |
| MinimumReturn | 248      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0060309600085020065
Validation loss = 0.006089171394705772
Validation loss = 0.005127439741045237
Validation loss = 0.006333470344543457
Validation loss = 0.005381183698773384
Validation loss = 0.006906756199896336
Validation loss = 0.0052529750391840935
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005862263962626457
Validation loss = 0.005727364681661129
Validation loss = 0.005405896343290806
Validation loss = 0.0057134684175252914
Validation loss = 0.005674557760357857
Validation loss = 0.0060456059873104095
Validation loss = 0.006098734214901924
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005985970608890057
Validation loss = 0.005889278836548328
Validation loss = 0.005431334488093853
Validation loss = 0.006040796637535095
Validation loss = 0.005421189591288567
Validation loss = 0.006263847928494215
Validation loss = 0.005442881025373936
Validation loss = 0.005456111393868923
Validation loss = 0.0064561921171844006
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006436957977712154
Validation loss = 0.005999450571835041
Validation loss = 0.005945784971117973
Validation loss = 0.006147061474621296
Validation loss = 0.005697627551853657
Validation loss = 0.006307452917098999
Validation loss = 0.005652700550854206
Validation loss = 0.005565755534917116
Validation loss = 0.006147340405732393
Validation loss = 0.005834192968904972
Validation loss = 0.005894667003303766
Validation loss = 0.005759161897003651
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006242350209504366
Validation loss = 0.005678829737007618
Validation loss = 0.005924541037529707
Validation loss = 0.0056342389434576035
Validation loss = 0.005515411961823702
Validation loss = 0.006233591120690107
Validation loss = 0.00552819948643446
Validation loss = 0.005805513821542263
Validation loss = 0.00639352248981595
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 2
average number of affinization = 28.816326530612244
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 0
average number of affinization = 28.24
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 6
average number of affinization = 27.80392156862745
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 0
average number of affinization = 27.26923076923077
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 4
average number of affinization = 26.830188679245282
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 0
average number of affinization = 26.333333333333332
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 268      |
| Iteration     | 7        |
| MaximumReturn | 301      |
| MinimumReturn | 256      |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006000025197863579
Validation loss = 0.005591182503849268
Validation loss = 0.0054858289659023285
Validation loss = 0.006019240710884333
Validation loss = 0.005996944382786751
Validation loss = 0.005577164236456156
Validation loss = 0.005723911803215742
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006385406944900751
Validation loss = 0.005435052793473005
Validation loss = 0.005979863926768303
Validation loss = 0.005179188214242458
Validation loss = 0.0053128330036997795
Validation loss = 0.005150541663169861
Validation loss = 0.005538913421332836
Validation loss = 0.0060227601788938046
Validation loss = 0.005322560202330351
Validation loss = 0.005687274970114231
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005320151802152395
Validation loss = 0.005382799077779055
Validation loss = 0.005405171308666468
Validation loss = 0.005881628021597862
Validation loss = 0.005298230797052383
Validation loss = 0.00506219919770956
Validation loss = 0.005599703639745712
Validation loss = 0.005544215906411409
Validation loss = 0.0050948429852724075
Validation loss = 0.005625648889690638
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005962388589978218
Validation loss = 0.005436370149254799
Validation loss = 0.005857300944626331
Validation loss = 0.006976616103202105
Validation loss = 0.006219870410859585
Validation loss = 0.007310913875699043
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005516375415027142
Validation loss = 0.005472716875374317
Validation loss = 0.005248363129794598
Validation loss = 0.006233383435755968
Validation loss = 0.006521597970277071
Validation loss = 0.005560468416661024
Validation loss = 0.006614300888031721
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 2
average number of affinization = 25.89090909090909
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 7
average number of affinization = 25.553571428571427
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 2
average number of affinization = 25.140350877192983
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 6
average number of affinization = 24.810344827586206
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 4
average number of affinization = 24.45762711864407
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 2
average number of affinization = 24.083333333333332
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 251      |
| Iteration     | 8        |
| MaximumReturn | 270      |
| MinimumReturn | 236      |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005787502508610487
Validation loss = 0.005064570344984531
Validation loss = 0.00593449454754591
Validation loss = 0.005093020386993885
Validation loss = 0.005266767926514149
Validation loss = 0.0057815671898424625
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005347331520169973
Validation loss = 0.005112662445753813
Validation loss = 0.005630190949887037
Validation loss = 0.004916133359074593
Validation loss = 0.006001768168061972
Validation loss = 0.005511091556400061
Validation loss = 0.005522069521248341
Validation loss = 0.0055071464739739895
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0056992219761013985
Validation loss = 0.005910343024879694
Validation loss = 0.0057054562494158745
Validation loss = 0.00612149853259325
Validation loss = 0.006398069206625223
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005827803164720535
Validation loss = 0.005934081040322781
Validation loss = 0.0058855400420725346
Validation loss = 0.005716618150472641
Validation loss = 0.005835688207298517
Validation loss = 0.005862468853592873
Validation loss = 0.005695232655853033
Validation loss = 0.005654097069054842
Validation loss = 0.005263935308903456
Validation loss = 0.0051983557641506195
Validation loss = 0.005634278990328312
Validation loss = 0.005274678580462933
Validation loss = 0.005062096752226353
Validation loss = 0.005004557315260172
Validation loss = 0.005277643445879221
Validation loss = 0.005141779780387878
Validation loss = 0.005572029389441013
Validation loss = 0.005571566056460142
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006435292307287455
Validation loss = 0.005341853015124798
Validation loss = 0.006442563142627478
Validation loss = 0.005289270542562008
Validation loss = 0.00580966379493475
Validation loss = 0.005137595348060131
Validation loss = 0.0053228819742798805
Validation loss = 0.005658533424139023
Validation loss = 0.005802006460726261
Validation loss = 0.0050726826302707195
Validation loss = 0.005762885790318251
Validation loss = 0.005401412956416607
Validation loss = 0.00514369597658515
Validation loss = 0.005032590590417385
Validation loss = 0.006053369026631117
Validation loss = 0.005547244101762772
Validation loss = 0.004968486726284027
Validation loss = 0.004707920830696821
Validation loss = 0.005979718174785376
Validation loss = 0.004998350981622934
Validation loss = 0.005215292796492577
Validation loss = 0.005156489554792643
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 36
average number of affinization = 24.278688524590162
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 45
average number of affinization = 24.612903225806452
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 4
average number of affinization = 24.285714285714285
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 17
average number of affinization = 24.171875
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 13
average number of affinization = 24.0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 26
average number of affinization = 24.03030303030303
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 264      |
| Iteration     | 9        |
| MaximumReturn | 309      |
| MinimumReturn | 241      |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005863259546458721
Validation loss = 0.005390508566051722
Validation loss = 0.005774082615971565
Validation loss = 0.00643783388659358
Validation loss = 0.006125676911324263
Validation loss = 0.00597408227622509
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005983604583889246
Validation loss = 0.005232544615864754
Validation loss = 0.005656933877617121
Validation loss = 0.005751302931457758
Validation loss = 0.006234491243958473
Validation loss = 0.005682795774191618
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005179789382964373
Validation loss = 0.005299100652337074
Validation loss = 0.006167102139443159
Validation loss = 0.0052449386566877365
Validation loss = 0.0051719737239181995
Validation loss = 0.0055994754657149315
Validation loss = 0.00525370379909873
Validation loss = 0.005464137531816959
Validation loss = 0.005304611753672361
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006562200374901295
Validation loss = 0.005566914565861225
Validation loss = 0.005724714603275061
Validation loss = 0.005674529355019331
Validation loss = 0.006201459094882011
Validation loss = 0.0073361145332455635
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005674699321389198
Validation loss = 0.005974535830318928
Validation loss = 0.006113907787948847
Validation loss = 0.0053418297320604324
Validation loss = 0.005445975810289383
Validation loss = 0.006572485435754061
Validation loss = 0.005714094266295433
Validation loss = 0.005374491680413485
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 15
average number of affinization = 23.895522388059703
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 27
average number of affinization = 23.941176470588236
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 9
average number of affinization = 23.72463768115942
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 23
average number of affinization = 23.714285714285715
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 20
average number of affinization = 23.661971830985916
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 9
average number of affinization = 23.458333333333332
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 304      |
| Iteration     | 10       |
| MaximumReturn | 309      |
| MinimumReturn | 297      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00635123485699296
Validation loss = 0.005534596275538206
Validation loss = 0.005914435256272554
Validation loss = 0.0061119250021874905
Validation loss = 0.0056425766088068485
Validation loss = 0.005631841719150543
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00546331936493516
Validation loss = 0.005681626033037901
Validation loss = 0.005147196352481842
Validation loss = 0.00528300553560257
Validation loss = 0.0052513680420815945
Validation loss = 0.005240034312009811
Validation loss = 0.005032464396208525
Validation loss = 0.006733328104019165
Validation loss = 0.005539244506508112
Validation loss = 0.005564736668020487
Validation loss = 0.0054557533003389835
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005777014885097742
Validation loss = 0.005309296306222677
Validation loss = 0.0055640642531216145
Validation loss = 0.005716277752071619
Validation loss = 0.0054674167186021805
Validation loss = 0.006547942291945219
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00548052042722702
Validation loss = 0.005166636314243078
Validation loss = 0.005663278046995401
Validation loss = 0.00538288289681077
Validation loss = 0.005536106880754232
Validation loss = 0.005216964054852724
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005296144634485245
Validation loss = 0.005952503066509962
Validation loss = 0.005088141653686762
Validation loss = 0.005362687166780233
Validation loss = 0.005054198205471039
Validation loss = 0.0053055062890052795
Validation loss = 0.00551913445815444
Validation loss = 0.005487961228936911
Validation loss = 0.006465747952461243
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 12
average number of affinization = 23.301369863013697
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 9
average number of affinization = 23.10810810810811
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 5
average number of affinization = 22.866666666666667
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 0
average number of affinization = 22.56578947368421
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 13
average number of affinization = 22.441558441558442
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 8
average number of affinization = 22.256410256410255
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 308      |
| Iteration     | 11       |
| MaximumReturn | 325      |
| MinimumReturn | 256      |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006042443681508303
Validation loss = 0.005388407036662102
Validation loss = 0.005797279998660088
Validation loss = 0.005016093607991934
Validation loss = 0.00538788503035903
Validation loss = 0.005858815275132656
Validation loss = 0.005448850337415934
Validation loss = 0.0052436525002121925
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005805470049381256
Validation loss = 0.005074022337794304
Validation loss = 0.005360652692615986
Validation loss = 0.0050791478715837
Validation loss = 0.005208766087889671
Validation loss = 0.005194715689867735
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005559784360229969
Validation loss = 0.0051878634840250015
Validation loss = 0.005109784658998251
Validation loss = 0.005242809187620878
Validation loss = 0.005769623443484306
Validation loss = 0.00528964726254344
Validation loss = 0.0054294029250741005
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005391525104641914
Validation loss = 0.0060082366690039635
Validation loss = 0.005231526214629412
Validation loss = 0.005359253846108913
Validation loss = 0.005499395076185465
Validation loss = 0.005330395419150591
Validation loss = 0.005655156914144754
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00505917239934206
Validation loss = 0.006272301077842712
Validation loss = 0.005322747398167849
Validation loss = 0.005706223659217358
Validation loss = 0.005413101986050606
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 16
average number of affinization = 22.17721518987342
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 23
average number of affinization = 22.1875
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 23
average number of affinization = 22.19753086419753
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 16
average number of affinization = 22.121951219512194
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 6
average number of affinization = 21.927710843373493
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 19
average number of affinization = 21.892857142857142
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 322      |
| Iteration     | 12       |
| MaximumReturn | 325      |
| MinimumReturn | 317      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005375881213694811
Validation loss = 0.005739868618547916
Validation loss = 0.0055139572359621525
Validation loss = 0.005006112158298492
Validation loss = 0.005279318895190954
Validation loss = 0.0062294225208461285
Validation loss = 0.005363880656659603
Validation loss = 0.006040886975824833
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005724085029214621
Validation loss = 0.005796030629426241
Validation loss = 0.005408184137195349
Validation loss = 0.004983129445463419
Validation loss = 0.005041619297116995
Validation loss = 0.005653034895658493
Validation loss = 0.005006805062294006
Validation loss = 0.005698874592781067
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00535553228110075
Validation loss = 0.00510767474770546
Validation loss = 0.00529170548543334
Validation loss = 0.005260232836008072
Validation loss = 0.0051340810023248196
Validation loss = 0.005008080508559942
Validation loss = 0.005259387195110321
Validation loss = 0.005011130124330521
Validation loss = 0.0061345030553638935
Validation loss = 0.005191097501665354
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005890069995075464
Validation loss = 0.0054360064677894115
Validation loss = 0.005610184278339148
Validation loss = 0.006112223025411367
Validation loss = 0.0055830166675150394
Validation loss = 0.00576882716268301
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005329782608896494
Validation loss = 0.005385898519307375
Validation loss = 0.004876784048974514
Validation loss = 0.005255354102700949
Validation loss = 0.0049805715680122375
Validation loss = 0.005914263892918825
Validation loss = 0.005408341996371746
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 28
average number of affinization = 21.96470588235294
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 22
average number of affinization = 21.96511627906977
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 25
average number of affinization = 22.0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 29
average number of affinization = 22.079545454545453
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 40
average number of affinization = 22.280898876404493
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 34
average number of affinization = 22.41111111111111
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 323      |
| Iteration     | 13       |
| MaximumReturn | 331      |
| MinimumReturn | 319      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005475124344229698
Validation loss = 0.005339235533028841
Validation loss = 0.005082014016807079
Validation loss = 0.0053913788869977
Validation loss = 0.0050355358980596066
Validation loss = 0.005376326851546764
Validation loss = 0.005371248349547386
Validation loss = 0.005534348543733358
Validation loss = 0.005209609400480986
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005436985287815332
Validation loss = 0.005217746831476688
Validation loss = 0.005142186302691698
Validation loss = 0.0052168541587889194
Validation loss = 0.0050568473525345325
Validation loss = 0.004765027668327093
Validation loss = 0.005378135479986668
Validation loss = 0.005330991465598345
Validation loss = 0.005017165560275316
Validation loss = 0.005235389340668917
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005178470630198717
Validation loss = 0.005650383420288563
Validation loss = 0.005329994484782219
Validation loss = 0.00509107019752264
Validation loss = 0.005054422654211521
Validation loss = 0.0050201755948364735
Validation loss = 0.005132129415869713
Validation loss = 0.004759639967232943
Validation loss = 0.0048507279716432095
Validation loss = 0.005559662822633982
Validation loss = 0.005116256419569254
Validation loss = 0.004961440339684486
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005622577853500843
Validation loss = 0.0051045590080320835
Validation loss = 0.005122366826981306
Validation loss = 0.005552918184548616
Validation loss = 0.005123031325638294
Validation loss = 0.005884462036192417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005947428289800882
Validation loss = 0.005378935020416975
Validation loss = 0.005399510730057955
Validation loss = 0.005108249373733997
Validation loss = 0.005549780558794737
Validation loss = 0.005482877139002085
Validation loss = 0.005700786132365465
Validation loss = 0.005147750023752451
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 25
average number of affinization = 22.439560439560438
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 23
average number of affinization = 22.445652173913043
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 9
average number of affinization = 22.301075268817204
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 11
average number of affinization = 22.180851063829788
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 5
average number of affinization = 22.0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 5
average number of affinization = 21.822916666666668
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 325      |
| Iteration     | 14       |
| MaximumReturn | 329      |
| MinimumReturn | 319      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004987976513803005
Validation loss = 0.005371157079935074
Validation loss = 0.005353692453354597
Validation loss = 0.005189106799662113
Validation loss = 0.005040720105171204
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004980040714144707
Validation loss = 0.0051428694278001785
Validation loss = 0.005225792061537504
Validation loss = 0.004904511850327253
Validation loss = 0.005298903211951256
Validation loss = 0.005684358067810535
Validation loss = 0.004750961437821388
Validation loss = 0.005743926391005516
Validation loss = 0.005103984847664833
Validation loss = 0.0049474239349365234
Validation loss = 0.00485352985560894
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005466863978654146
Validation loss = 0.0051533011719584465
Validation loss = 0.005205487832427025
Validation loss = 0.00472372816875577
Validation loss = 0.005033228080719709
Validation loss = 0.005029693245887756
Validation loss = 0.0057554398663342
Validation loss = 0.005037072114646435
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005661347880959511
Validation loss = 0.005400466732680798
Validation loss = 0.0056084576062858105
Validation loss = 0.006482716184109449
Validation loss = 0.005164628848433495
Validation loss = 0.004931415431201458
Validation loss = 0.0045501249842345715
Validation loss = 0.005250731483101845
Validation loss = 0.005266969092190266
Validation loss = 0.005109646823257208
Validation loss = 0.005322488024830818
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005326269194483757
Validation loss = 0.00566775631159544
Validation loss = 0.00500941788777709
Validation loss = 0.0053787194192409515
Validation loss = 0.006252908613532782
Validation loss = 0.005282523576170206
Validation loss = 0.005466590169817209
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 23
average number of affinization = 21.835051546391753
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 30
average number of affinization = 21.918367346938776
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 16
average number of affinization = 21.858585858585858
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 22
average number of affinization = 21.86
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 37
average number of affinization = 22.00990099009901
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 34
average number of affinization = 22.127450980392158
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 323      |
| Iteration     | 15       |
| MaximumReturn | 327      |
| MinimumReturn | 319      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00508979894220829
Validation loss = 0.005074378103017807
Validation loss = 0.004914034157991409
Validation loss = 0.005430157762020826
Validation loss = 0.005646879784762859
Validation loss = 0.0052496204152703285
Validation loss = 0.0053977216593921185
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005913350731134415
Validation loss = 0.004727312363684177
Validation loss = 0.004855202976614237
Validation loss = 0.00533786928281188
Validation loss = 0.005440735258162022
Validation loss = 0.005444695707410574
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005095211323350668
Validation loss = 0.005044192541390657
Validation loss = 0.004903355613350868
Validation loss = 0.005085606127977371
Validation loss = 0.004959621001034975
Validation loss = 0.0049217171035707
Validation loss = 0.005487528163939714
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005418960936367512
Validation loss = 0.0051966942846775055
Validation loss = 0.004858187399804592
Validation loss = 0.005030699074268341
Validation loss = 0.005418070591986179
Validation loss = 0.005370972212404013
Validation loss = 0.004913141950964928
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005767866037786007
Validation loss = 0.005549314897507429
Validation loss = 0.005221263505518436
Validation loss = 0.005790739320218563
Validation loss = 0.0049058529548347
Validation loss = 0.0053521571680903435
Validation loss = 0.005282241851091385
Validation loss = 0.004719612654298544
Validation loss = 0.004939632024616003
Validation loss = 0.005289021413773298
Validation loss = 0.005605739541351795
Validation loss = 0.005309175234287977
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 16
average number of affinization = 22.067961165048544
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 14
average number of affinization = 21.990384615384617
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 16
average number of affinization = 21.933333333333334
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 15
average number of affinization = 21.867924528301888
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 14
average number of affinization = 21.794392523364486
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 4
average number of affinization = 21.62962962962963
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 309      |
| Iteration     | 16       |
| MaximumReturn | 320      |
| MinimumReturn | 301      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00531346071511507
Validation loss = 0.005183607339859009
Validation loss = 0.005357157438993454
Validation loss = 0.005191940814256668
Validation loss = 0.004839213564991951
Validation loss = 0.0049684904515743256
Validation loss = 0.005210476461797953
Validation loss = 0.005468424409627914
Validation loss = 0.005067664198577404
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005362831521779299
Validation loss = 0.00491715595126152
Validation loss = 0.004580644890666008
Validation loss = 0.005242709536105394
Validation loss = 0.0047544012777507305
Validation loss = 0.0047813355922698975
Validation loss = 0.004739999771118164
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005230410024523735
Validation loss = 0.005265956744551659
Validation loss = 0.00523859029635787
Validation loss = 0.005434916354715824
Validation loss = 0.004682700149714947
Validation loss = 0.004618943203240633
Validation loss = 0.004939340054988861
Validation loss = 0.0048513999208807945
Validation loss = 0.005003892816603184
Validation loss = 0.005038642790168524
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004938398953527212
Validation loss = 0.005028007552027702
Validation loss = 0.005354018881917
Validation loss = 0.004740454256534576
Validation loss = 0.004544416908174753
Validation loss = 0.004985125735402107
Validation loss = 0.004833266604691744
Validation loss = 0.004876367747783661
Validation loss = 0.0049904645420610905
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005550438072532415
Validation loss = 0.005497211590409279
Validation loss = 0.005670654121786356
Validation loss = 0.005064619239419699
Validation loss = 0.00542112672701478
Validation loss = 0.004845247603952885
Validation loss = 0.004925168119370937
Validation loss = 0.005351922940462828
Validation loss = 0.005465150345116854
Validation loss = 0.004912338685244322
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 13
average number of affinization = 21.55045871559633
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 25
average number of affinization = 21.581818181818182
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 16
average number of affinization = 21.53153153153153
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 19
average number of affinization = 21.508928571428573
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 11
average number of affinization = 21.41592920353982
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 15
average number of affinization = 21.359649122807017
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 306      |
| Iteration     | 17       |
| MaximumReturn | 314      |
| MinimumReturn | 298      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004706043750047684
Validation loss = 0.006321162451058626
Validation loss = 0.0047266194596886635
Validation loss = 0.004983637481927872
Validation loss = 0.004693891387432814
Validation loss = 0.005429158918559551
Validation loss = 0.004359025973826647
Validation loss = 0.004940440878272057
Validation loss = 0.005365981720387936
Validation loss = 0.004731422755867243
Validation loss = 0.005073809530586004
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004762781783938408
Validation loss = 0.004806584678590298
Validation loss = 0.005141241475939751
Validation loss = 0.005035667680203915
Validation loss = 0.004701733123511076
Validation loss = 0.00477556511759758
Validation loss = 0.004556810017675161
Validation loss = 0.004685108549892902
Validation loss = 0.004477163311094046
Validation loss = 0.00501190684735775
Validation loss = 0.004529740661382675
Validation loss = 0.004552469588816166
Validation loss = 0.004479080438613892
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00491292541846633
Validation loss = 0.004726761020720005
Validation loss = 0.004942556377500296
Validation loss = 0.005216022487729788
Validation loss = 0.004641692154109478
Validation loss = 0.005035771522670984
Validation loss = 0.004697803407907486
Validation loss = 0.004415946546941996
Validation loss = 0.004577876999974251
Validation loss = 0.004758387804031372
Validation loss = 0.005192412994801998
Validation loss = 0.004354502074420452
Validation loss = 0.004575427155941725
Validation loss = 0.004589044023305178
Validation loss = 0.00458204559981823
Validation loss = 0.0050443969666957855
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00477280467748642
Validation loss = 0.004906711168587208
Validation loss = 0.004827233497053385
Validation loss = 0.004859288688749075
Validation loss = 0.004755637142807245
Validation loss = 0.0049573988653719425
Validation loss = 0.005106853321194649
Validation loss = 0.005043151322752237
Validation loss = 0.005013032350689173
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0049143824726343155
Validation loss = 0.004997890442609787
Validation loss = 0.005715835839509964
Validation loss = 0.0050326865166425705
Validation loss = 0.004773409105837345
Validation loss = 0.006048102863132954
Validation loss = 0.005088125821202993
Validation loss = 0.004643463995307684
Validation loss = 0.004746155813336372
Validation loss = 0.006088689435273409
Validation loss = 0.00509672099724412
Validation loss = 0.004955099429935217
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 22
average number of affinization = 21.36521739130435
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 10
average number of affinization = 21.267241379310345
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 8
average number of affinization = 21.153846153846153
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 16
average number of affinization = 21.110169491525422
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 12
average number of affinization = 21.03361344537815
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 6
average number of affinization = 20.908333333333335
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 298      |
| Iteration     | 18       |
| MaximumReturn | 301      |
| MinimumReturn | 295      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004840938840061426
Validation loss = 0.004613739438354969
Validation loss = 0.004742371384054422
Validation loss = 0.00463143503293395
Validation loss = 0.004717247094959021
Validation loss = 0.004616891033947468
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004718556068837643
Validation loss = 0.005158155225217342
Validation loss = 0.005095008760690689
Validation loss = 0.005406023468822241
Validation loss = 0.004833109211176634
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004853337071835995
Validation loss = 0.004672014620155096
Validation loss = 0.005068729631602764
Validation loss = 0.004504285752773285
Validation loss = 0.00446268729865551
Validation loss = 0.005146305542439222
Validation loss = 0.005004719831049442
Validation loss = 0.004934600554406643
Validation loss = 0.004665316082537174
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005074989050626755
Validation loss = 0.004684468265622854
Validation loss = 0.005316789261996746
Validation loss = 0.004299867898225784
Validation loss = 0.004532544407993555
Validation loss = 0.004710576497018337
Validation loss = 0.0046125659719109535
Validation loss = 0.004702620208263397
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004671357572078705
Validation loss = 0.004576845560222864
Validation loss = 0.005223920103162527
Validation loss = 0.00490594794973731
Validation loss = 0.005009355489164591
Validation loss = 0.004955905489623547
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 22
average number of affinization = 20.917355371900825
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 34
average number of affinization = 21.024590163934427
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 15
average number of affinization = 20.975609756097562
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 13
average number of affinization = 20.911290322580644
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 23
average number of affinization = 20.928
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 16
average number of affinization = 20.88888888888889
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 302      |
| Iteration     | 19       |
| MaximumReturn | 308      |
| MinimumReturn | 297      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004771643318235874
Validation loss = 0.004569190088659525
Validation loss = 0.004689012188464403
Validation loss = 0.004921404644846916
Validation loss = 0.005352149251848459
Validation loss = 0.0045535569079220295
Validation loss = 0.004522139206528664
Validation loss = 0.004718812648206949
Validation loss = 0.0050821672193706036
Validation loss = 0.004647981375455856
Validation loss = 0.004712817724794149
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004566690884530544
Validation loss = 0.004600779619067907
Validation loss = 0.004850506316870451
Validation loss = 0.005269024055451155
Validation loss = 0.0049323453567922115
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0050052679143846035
Validation loss = 0.004773590713739395
Validation loss = 0.0045080119743943214
Validation loss = 0.004644479602575302
Validation loss = 0.0046499306336045265
Validation loss = 0.004586535040289164
Validation loss = 0.004787238780409098
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004667749162763357
Validation loss = 0.004727550316601992
Validation loss = 0.004952579736709595
Validation loss = 0.005316412542015314
Validation loss = 0.004906047135591507
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006521555129438639
Validation loss = 0.005387215409427881
Validation loss = 0.0051095373928546906
Validation loss = 0.005533026996999979
Validation loss = 0.004692608956247568
Validation loss = 0.005026362836360931
Validation loss = 0.0052464366890490055
Validation loss = 0.006195220164954662
Validation loss = 0.004903895314782858
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 43
average number of affinization = 21.062992125984252
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 58
average number of affinization = 21.3515625
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 50
average number of affinization = 21.573643410852714
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 42
average number of affinization = 21.73076923076923
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 42
average number of affinization = 21.885496183206108
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 47
average number of affinization = 22.075757575757574
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 299      |
| Iteration     | 20       |
| MaximumReturn | 302      |
| MinimumReturn | 297      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005029041785746813
Validation loss = 0.0047542802058160305
Validation loss = 0.0049099549651145935
Validation loss = 0.005624285899102688
Validation loss = 0.0048834094777703285
Validation loss = 0.004692434333264828
Validation loss = 0.005041343159973621
Validation loss = 0.004473995883017778
Validation loss = 0.004938278812915087
Validation loss = 0.0050451550632715225
Validation loss = 0.004369526635855436
Validation loss = 0.005072626750916243
Validation loss = 0.005225012544542551
Validation loss = 0.00493537075817585
Validation loss = 0.0045733037404716015
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005272531416267157
Validation loss = 0.004844685085117817
Validation loss = 0.004781198222190142
Validation loss = 0.004879829008132219
Validation loss = 0.004645323380827904
Validation loss = 0.0045059677213430405
Validation loss = 0.004667075350880623
Validation loss = 0.004934003110975027
Validation loss = 0.005133400205522776
Validation loss = 0.004776460584253073
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005795704200863838
Validation loss = 0.0046152579598128796
Validation loss = 0.004585247486829758
Validation loss = 0.004757252521812916
Validation loss = 0.004821550101041794
Validation loss = 0.004635652992874384
Validation loss = 0.0049035088159143925
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004823611583560705
Validation loss = 0.004808533471077681
Validation loss = 0.0049573746509850025
Validation loss = 0.004844134673476219
Validation loss = 0.004487897269427776
Validation loss = 0.005030987784266472
Validation loss = 0.0048333765007555485
Validation loss = 0.004721628502011299
Validation loss = 0.004491236060857773
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005001555662602186
Validation loss = 0.005312761291861534
Validation loss = 0.005000101402401924
Validation loss = 0.005093499086797237
Validation loss = 0.004835398867726326
Validation loss = 0.004867818672209978
Validation loss = 0.0051893871277570724
Validation loss = 0.005382435861974955
Validation loss = 0.005277274642139673
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 81
average number of affinization = 22.518796992481203
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 94
average number of affinization = 23.05223880597015
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 74
average number of affinization = 23.42962962962963
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 68
average number of affinization = 23.75735294117647
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 74
average number of affinization = 24.124087591240876
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 62
average number of affinization = 24.39855072463768
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 307      |
| Iteration     | 21       |
| MaximumReturn | 312      |
| MinimumReturn | 303      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004472676664590836
Validation loss = 0.0048163775354623795
Validation loss = 0.004642450716346502
Validation loss = 0.004916698206216097
Validation loss = 0.0046487306244671345
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004964882042258978
Validation loss = 0.004810039885342121
Validation loss = 0.004619572311639786
Validation loss = 0.004811746533960104
Validation loss = 0.0053852032870054245
Validation loss = 0.004982023499906063
Validation loss = 0.00491133239120245
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00556434178724885
Validation loss = 0.005229635629802942
Validation loss = 0.004820030182600021
Validation loss = 0.004473482258617878
Validation loss = 0.004973653703927994
Validation loss = 0.005217319820076227
Validation loss = 0.004868219140917063
Validation loss = 0.004698507022112608
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005507747642695904
Validation loss = 0.004917480051517487
Validation loss = 0.004734162241220474
Validation loss = 0.004874163307249546
Validation loss = 0.005109335295855999
Validation loss = 0.005096883978694677
Validation loss = 0.004921239335089922
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004756060428917408
Validation loss = 0.0055801887065172195
Validation loss = 0.004950685426592827
Validation loss = 0.005857768934220076
Validation loss = 0.0048985774628818035
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 69
average number of affinization = 24.719424460431654
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 58
average number of affinization = 24.957142857142856
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 87
average number of affinization = 25.397163120567377
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 55
average number of affinization = 25.6056338028169
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 58
average number of affinization = 25.832167832167833
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 68
average number of affinization = 26.125
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 318      |
| Iteration     | 22       |
| MaximumReturn | 322      |
| MinimumReturn | 316      |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0046395193785429
Validation loss = 0.005129348021000624
Validation loss = 0.004831101279705763
Validation loss = 0.0047093541361391544
Validation loss = 0.004708158317953348
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004652328789234161
Validation loss = 0.004944849759340286
Validation loss = 0.005053072702139616
Validation loss = 0.004827577620744705
Validation loss = 0.005200306884944439
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004594071768224239
Validation loss = 0.004861020017415285
Validation loss = 0.004594224970787764
Validation loss = 0.004979046061635017
Validation loss = 0.0048809596337378025
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005477487575262785
Validation loss = 0.0049576107412576675
Validation loss = 0.004928893875330687
Validation loss = 0.004774364177137613
Validation loss = 0.004489421844482422
Validation loss = 0.00462043983861804
Validation loss = 0.005054192151874304
Validation loss = 0.004869227297604084
Validation loss = 0.005792940501123667
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005329210311174393
Validation loss = 0.005198662634938955
Validation loss = 0.0052956934086978436
Validation loss = 0.005112667102366686
Validation loss = 0.005290547851473093
Validation loss = 0.004870385397225618
Validation loss = 0.005097159184515476
Validation loss = 0.00470480602234602
Validation loss = 0.004793464671820402
Validation loss = 0.0048057143576443195
Validation loss = 0.005006975959986448
Validation loss = 0.005258596036583185
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 126
average number of affinization = 26.813793103448276
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 109
average number of affinization = 27.376712328767123
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 98
average number of affinization = 27.857142857142858
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 97
average number of affinization = 28.324324324324323
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 113
average number of affinization = 28.892617449664428
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 28
average number of affinization = 28.886666666666667
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 310      |
| Iteration     | 23       |
| MaximumReturn | 318      |
| MinimumReturn | 304      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004887763410806656
Validation loss = 0.0047670453786849976
Validation loss = 0.004952253308147192
Validation loss = 0.004955813754349947
Validation loss = 0.00497790239751339
Validation loss = 0.004478035494685173
Validation loss = 0.004784699529409409
Validation loss = 0.0048311734572052956
Validation loss = 0.004567192867398262
Validation loss = 0.004524190444499254
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0053155142813920975
Validation loss = 0.004764944780617952
Validation loss = 0.004863982554525137
Validation loss = 0.005238169338554144
Validation loss = 0.004866606555879116
Validation loss = 0.004845759365707636
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005011634435504675
Validation loss = 0.004625305067747831
Validation loss = 0.00453998101875186
Validation loss = 0.004666876047849655
Validation loss = 0.004894413519650698
Validation loss = 0.004659406375139952
Validation loss = 0.005164114758372307
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004789637867361307
Validation loss = 0.004581104032695293
Validation loss = 0.004986098036170006
Validation loss = 0.005134012084454298
Validation loss = 0.004881695844233036
Validation loss = 0.004973818548023701
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005261799320578575
Validation loss = 0.004996999632567167
Validation loss = 0.004926073364913464
Validation loss = 0.005018764175474644
Validation loss = 0.005124046001583338
Validation loss = 0.00544784078374505
Validation loss = 0.00540765468031168
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 54
average number of affinization = 29.05298013245033
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 81
average number of affinization = 29.394736842105264
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 82
average number of affinization = 29.73856209150327
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 50
average number of affinization = 29.87012987012987
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 67
average number of affinization = 30.10967741935484
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 59
average number of affinization = 30.294871794871796
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 318      |
| Iteration     | 24       |
| MaximumReturn | 323      |
| MinimumReturn | 313      |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004853057209402323
Validation loss = 0.004663864150643349
Validation loss = 0.004588223062455654
Validation loss = 0.004902959335595369
Validation loss = 0.005027868784964085
Validation loss = 0.004755791742354631
Validation loss = 0.004773709457367659
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004774325992912054
Validation loss = 0.0048906560987234116
Validation loss = 0.005079840309917927
Validation loss = 0.0046495096758008
Validation loss = 0.00477671530097723
Validation loss = 0.004639126360416412
Validation loss = 0.004597365390509367
Validation loss = 0.004681876394897699
Validation loss = 0.004941762425005436
Validation loss = 0.00474298233166337
Validation loss = 0.00488445907831192
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004600380547344685
Validation loss = 0.004631745163351297
Validation loss = 0.005135522224009037
Validation loss = 0.004690965637564659
Validation loss = 0.005044430494308472
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004644364584237337
Validation loss = 0.004630067851394415
Validation loss = 0.004713775590062141
Validation loss = 0.004959913436323404
Validation loss = 0.004993907641619444
Validation loss = 0.004857427440583706
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0049393074586987495
Validation loss = 0.005236111581325531
Validation loss = 0.00508603872731328
Validation loss = 0.0047703892923891544
Validation loss = 0.004598063416779041
Validation loss = 0.004738528747111559
Validation loss = 0.004888955969363451
Validation loss = 0.004976163152605295
Validation loss = 0.004736312199383974
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 94
average number of affinization = 30.70063694267516
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 75
average number of affinization = 30.981012658227847
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 76
average number of affinization = 31.264150943396228
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 82
average number of affinization = 31.58125
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 78
average number of affinization = 31.869565217391305
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 86
average number of affinization = 32.2037037037037
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 326      |
| Iteration     | 25       |
| MaximumReturn | 328      |
| MinimumReturn | 325      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0047441511414945126
Validation loss = 0.004813991952687502
Validation loss = 0.005169594660401344
Validation loss = 0.005034760106354952
Validation loss = 0.004798099398612976
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0051349769346416
Validation loss = 0.00484194653108716
Validation loss = 0.004778054542839527
Validation loss = 0.004876044113188982
Validation loss = 0.004999522119760513
Validation loss = 0.004969486966729164
Validation loss = 0.004639747086912394
Validation loss = 0.0046568503603339195
Validation loss = 0.005187145434319973
Validation loss = 0.004843045491725206
Validation loss = 0.004552902653813362
Validation loss = 0.0044337487779557705
Validation loss = 0.005107777658849955
Validation loss = 0.004973705857992172
Validation loss = 0.004986973945051432
Validation loss = 0.005235809832811356
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00474267965182662
Validation loss = 0.00462325057014823
Validation loss = 0.005015740171074867
Validation loss = 0.004836555104702711
Validation loss = 0.004761768504977226
Validation loss = 0.0046701920218765736
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004852405283600092
Validation loss = 0.00503805186599493
Validation loss = 0.004752159584313631
Validation loss = 0.005087852943688631
Validation loss = 0.0048250239342451096
Validation loss = 0.0052329483442008495
Validation loss = 0.0052299839444458485
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00495338998734951
Validation loss = 0.005037452559918165
Validation loss = 0.00581152131780982
Validation loss = 0.004846577066928148
Validation loss = 0.004849427845329046
Validation loss = 0.005211586132645607
Validation loss = 0.005429660901427269
Validation loss = 0.004711807239800692
Validation loss = 0.005076298490166664
Validation loss = 0.005342553835362196
Validation loss = 0.005043175537139177
Validation loss = 0.004640454892069101
Validation loss = 0.004954785108566284
Validation loss = 0.004718556068837643
Validation loss = 0.0050753140822052956
Validation loss = 0.005032201763242483
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 92
average number of affinization = 32.57055214723926
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 84
average number of affinization = 32.88414634146341
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 52
average number of affinization = 33.0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 79
average number of affinization = 33.27710843373494
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 62
average number of affinization = 33.449101796407184
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 64
average number of affinization = 33.63095238095238
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 319      |
| Iteration     | 26       |
| MaximumReturn | 323      |
| MinimumReturn | 316      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004790654871612787
Validation loss = 0.004814418964087963
Validation loss = 0.005534876137971878
Validation loss = 0.0046316455118358135
Validation loss = 0.004963308572769165
Validation loss = 0.004551643971353769
Validation loss = 0.005605295766144991
Validation loss = 0.004503901582211256
Validation loss = 0.0045302691869437695
Validation loss = 0.004707439802587032
Validation loss = 0.0051497770473361015
Validation loss = 0.00490820175036788
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0046304999850690365
Validation loss = 0.004834162537008524
Validation loss = 0.004868991207331419
Validation loss = 0.004849615041166544
Validation loss = 0.004726198967546225
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0050640166737139225
Validation loss = 0.0047998069785535336
Validation loss = 0.00501951202750206
Validation loss = 0.004842270631343126
Validation loss = 0.0052559711039066315
Validation loss = 0.004759242292493582
Validation loss = 0.0046055191196501255
Validation loss = 0.00469461502507329
Validation loss = 0.004636339843273163
Validation loss = 0.00466667115688324
Validation loss = 0.005095707718282938
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004858252126723528
Validation loss = 0.005088741425424814
Validation loss = 0.004955551587045193
Validation loss = 0.0047828746028244495
Validation loss = 0.004953794181346893
Validation loss = 0.005649134051054716
Validation loss = 0.004869536496698856
Validation loss = 0.004862266592681408
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005080065689980984
Validation loss = 0.00493510439991951
Validation loss = 0.005229915026575327
Validation loss = 0.00503132538869977
Validation loss = 0.0049261292442679405
Validation loss = 0.004864110611379147
Validation loss = 0.005144193768501282
Validation loss = 0.005082219373434782
Validation loss = 0.004630043637007475
Validation loss = 0.004813208244740963
Validation loss = 0.005078201647847891
Validation loss = 0.004784891847521067
Validation loss = 0.004917151760309935
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 87
average number of affinization = 33.946745562130175
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 91
average number of affinization = 34.28235294117647
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 83
average number of affinization = 34.567251461988306
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 51
average number of affinization = 34.66279069767442
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 80
average number of affinization = 34.92485549132948
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 99
average number of affinization = 35.293103448275865
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 320      |
| Iteration     | 27       |
| MaximumReturn | 321      |
| MinimumReturn | 317      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005209628958255053
Validation loss = 0.004686570260673761
Validation loss = 0.004897630773484707
Validation loss = 0.004803535062819719
Validation loss = 0.005013638641685247
Validation loss = 0.004733613226562738
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005224131513386965
Validation loss = 0.004934488795697689
Validation loss = 0.004783987067639828
Validation loss = 0.0048629106022417545
Validation loss = 0.005062143784016371
Validation loss = 0.00478324294090271
Validation loss = 0.004916872829198837
Validation loss = 0.004719340242445469
Validation loss = 0.004727969877421856
Validation loss = 0.0048504420556128025
Validation loss = 0.004961061757057905
Validation loss = 0.004737900570034981
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0051524825394153595
Validation loss = 0.004901477135717869
Validation loss = 0.004855127073824406
Validation loss = 0.004698494914919138
Validation loss = 0.0046220324002206326
Validation loss = 0.005004932172596455
Validation loss = 0.004650960210710764
Validation loss = 0.004761165473610163
Validation loss = 0.005018680822104216
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004642766900360584
Validation loss = 0.004882617853581905
Validation loss = 0.0045308866538107395
Validation loss = 0.004703814629465342
Validation loss = 0.005120991729199886
Validation loss = 0.0047611515037715435
Validation loss = 0.0047682118602097034
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004791591316461563
Validation loss = 0.00480756675824523
Validation loss = 0.004822160117328167
Validation loss = 0.005221025552600622
Validation loss = 0.004963995888829231
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 69
average number of affinization = 35.48571428571429
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 37
average number of affinization = 35.49431818181818
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 87
average number of affinization = 35.78531073446328
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 104
average number of affinization = 36.168539325842694
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 71
average number of affinization = 36.36312849162011
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 51
average number of affinization = 36.44444444444444
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 315      |
| Iteration     | 28       |
| MaximumReturn | 321      |
| MinimumReturn | 311      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00500929169356823
Validation loss = 0.0048254504799842834
Validation loss = 0.004728352651000023
Validation loss = 0.004615489859133959
Validation loss = 0.004963315092027187
Validation loss = 0.004612650256603956
Validation loss = 0.005026315338909626
Validation loss = 0.004860760644078255
Validation loss = 0.004801331087946892
Validation loss = 0.004570576827973127
Validation loss = 0.005192013923078775
Validation loss = 0.004920108709484339
Validation loss = 0.004784748423844576
Validation loss = 0.004772507585585117
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005173787474632263
Validation loss = 0.004926967900246382
Validation loss = 0.004871333483606577
Validation loss = 0.004636049270629883
Validation loss = 0.004824050236493349
Validation loss = 0.004632740747183561
Validation loss = 0.004782556556165218
Validation loss = 0.004789045546203852
Validation loss = 0.004979362711310387
Validation loss = 0.005074984859675169
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0046576946042478085
Validation loss = 0.004835593979805708
Validation loss = 0.005003465339541435
Validation loss = 0.004627717658877373
Validation loss = 0.0046139005571603775
Validation loss = 0.004875584505498409
Validation loss = 0.005105358548462391
Validation loss = 0.004949916619807482
Validation loss = 0.004692650865763426
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005388258025050163
Validation loss = 0.004686931613832712
Validation loss = 0.004931366071105003
Validation loss = 0.004702301230281591
Validation loss = 0.005011485889554024
Validation loss = 0.004973339848220348
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00499674491584301
Validation loss = 0.0049171908758580685
Validation loss = 0.004987838678061962
Validation loss = 0.004849210847169161
Validation loss = 0.0051092286594212055
Validation loss = 0.004990376532077789
Validation loss = 0.004785798955708742
Validation loss = 0.0047536566853523254
Validation loss = 0.0050292727537453175
Validation loss = 0.004895600024610758
Validation loss = 0.004888777155429125
Validation loss = 0.004733682610094547
Validation loss = 0.004922240506857634
Validation loss = 0.005088756792247295
Validation loss = 0.004926069173961878
Validation loss = 0.0047527397982776165
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 66
average number of affinization = 36.607734806629836
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 57
average number of affinization = 36.71978021978022
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 86
average number of affinization = 36.98907103825137
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 31
average number of affinization = 36.95652173913044
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 60
average number of affinization = 37.08108108108108
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 56
average number of affinization = 37.18279569892473
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 317      |
| Iteration     | 29       |
| MaximumReturn | 319      |
| MinimumReturn | 314      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005030064843595028
Validation loss = 0.00499355373904109
Validation loss = 0.00500053446739912
Validation loss = 0.004686995875090361
Validation loss = 0.004574339836835861
Validation loss = 0.005014033988118172
Validation loss = 0.005413231905549765
Validation loss = 0.004975358489900827
Validation loss = 0.004470366518944502
Validation loss = 0.004712247755378485
Validation loss = 0.0046551586128771305
Validation loss = 0.00501625519245863
Validation loss = 0.004723985679447651
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0047506773844361305
Validation loss = 0.005054365377873182
Validation loss = 0.004741340409964323
Validation loss = 0.004860146436840296
Validation loss = 0.004824439529329538
Validation loss = 0.004982651676982641
Validation loss = 0.004827182739973068
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004867141135036945
Validation loss = 0.004749131388962269
Validation loss = 0.00443484028801322
Validation loss = 0.004637381061911583
Validation loss = 0.004595663398504257
Validation loss = 0.00501250009983778
Validation loss = 0.004714916925877333
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004946379456669092
Validation loss = 0.005051541142165661
Validation loss = 0.004754771012812853
Validation loss = 0.005175486672669649
Validation loss = 0.0048530614003539085
Validation loss = 0.004987851716578007
Validation loss = 0.004943112842738628
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005035501439124346
Validation loss = 0.005101736169308424
Validation loss = 0.00479382136836648
Validation loss = 0.004962333478033543
Validation loss = 0.004537955857813358
Validation loss = 0.004756335634738207
Validation loss = 0.00482894666492939
Validation loss = 0.005087141413241625
Validation loss = 0.004751811735332012
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 92
average number of affinization = 37.475935828877006
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 74
average number of affinization = 37.670212765957444
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 79
average number of affinization = 37.888888888888886
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 63
average number of affinization = 38.02105263157895
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 90
average number of affinization = 38.29319371727749
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 121
average number of affinization = 38.723958333333336
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 317      |
| Iteration     | 30       |
| MaximumReturn | 321      |
| MinimumReturn | 316      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005073025822639465
Validation loss = 0.00502974446862936
Validation loss = 0.0047303191386163235
Validation loss = 0.005065376870334148
Validation loss = 0.004915996920317411
Validation loss = 0.005033594556152821
Validation loss = 0.005134988110512495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005230296403169632
Validation loss = 0.004896593280136585
Validation loss = 0.005553962662816048
Validation loss = 0.005473807454109192
Validation loss = 0.00514749251306057
Validation loss = 0.004885729402303696
Validation loss = 0.005159190855920315
Validation loss = 0.0047823963686823845
Validation loss = 0.004880144726485014
Validation loss = 0.004883232526481152
Validation loss = 0.0051115588285028934
Validation loss = 0.005237756762653589
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0047865984961390495
Validation loss = 0.004915435798466206
Validation loss = 0.00482823746278882
Validation loss = 0.004999998491257429
Validation loss = 0.004948037210851908
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005258413031697273
Validation loss = 0.005188430659472942
Validation loss = 0.005120237357914448
Validation loss = 0.005211023613810539
Validation loss = 0.005581601522862911
Validation loss = 0.005005329847335815
Validation loss = 0.004946792032569647
Validation loss = 0.0049888272769749165
Validation loss = 0.005209052935242653
Validation loss = 0.005105488933622837
Validation loss = 0.004856586456298828
Validation loss = 0.005127254873514175
Validation loss = 0.004911369644105434
Validation loss = 0.0048913499340415
Validation loss = 0.005242472048848867
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004975547082722187
Validation loss = 0.005265277810394764
Validation loss = 0.005247491877526045
Validation loss = 0.004920918494462967
Validation loss = 0.0052488697692751884
Validation loss = 0.005144739523530006
Validation loss = 0.004809923470020294
Validation loss = 0.005464181303977966
Validation loss = 0.0054810321889817715
Validation loss = 0.004839371424168348
Validation loss = 0.00519477529451251
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 106
average number of affinization = 39.07253886010363
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 110
average number of affinization = 39.43814432989691
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 107
average number of affinization = 39.784615384615385
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 102
average number of affinization = 40.10204081632653
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 104
average number of affinization = 40.4263959390863
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 113
average number of affinization = 40.792929292929294
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 329      |
| Iteration     | 31       |
| MaximumReturn | 329      |
| MinimumReturn | 327      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005051589570939541
Validation loss = 0.0047593205235898495
Validation loss = 0.00478409556671977
Validation loss = 0.004777160473167896
Validation loss = 0.00498678395524621
Validation loss = 0.004975659307092428
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005169687792658806
Validation loss = 0.005262103863060474
Validation loss = 0.00498032895848155
Validation loss = 0.005284136161208153
Validation loss = 0.004895400255918503
Validation loss = 0.004890582524240017
Validation loss = 0.005076380912214518
Validation loss = 0.005004942882806063
Validation loss = 0.004952864721417427
Validation loss = 0.005239604040980339
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005010746419429779
Validation loss = 0.005173876881599426
Validation loss = 0.005232926458120346
Validation loss = 0.005101058632135391
Validation loss = 0.005116243846714497
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004961224738508463
Validation loss = 0.005687287077307701
Validation loss = 0.004928936250507832
Validation loss = 0.0049948422238230705
Validation loss = 0.005816510878503323
Validation loss = 0.0052379947155714035
Validation loss = 0.005264302249997854
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005117242690175772
Validation loss = 0.005637131631374359
Validation loss = 0.005067181773483753
Validation loss = 0.00516702001914382
Validation loss = 0.005140461027622223
Validation loss = 0.005146408919245005
Validation loss = 0.0051299515180289745
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 1 is 74
average number of affinization = 40.959798994974875
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 1 is 84
average number of affinization = 41.175
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 1 is 97
average number of affinization = 41.45273631840796
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 1 is 106
average number of affinization = 41.772277227722775
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 1 is 103
average number of affinization = 42.073891625615765
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 1 is 70
average number of affinization = 42.21078431372549
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 320      |
| Iteration     | 32       |
| MaximumReturn | 325      |
| MinimumReturn | 314      |
| TotalSamples  | 136000   |
----------------------------
