Logging to experiments/half_cheetah/oct29/w350e3_seed2341
Print configuration .....
{'env_name': 'half_cheetah', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 0
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 0
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 0
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 0
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 0
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 0
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.8052101135253906
Validation loss = 0.6462458372116089
Validation loss = 0.6504155397415161
Validation loss = 0.6780204176902771
Validation loss = 0.47397923469543457
Validation loss = 0.44543707370758057
Validation loss = 0.4933144450187683
Validation loss = 0.5440550446510315
Validation loss = 0.6684860587120056
Validation loss = 0.6519032120704651
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 1.628699779510498
Validation loss = 0.5004162192344666
Validation loss = 0.6419742107391357
Validation loss = 0.6558712124824524
Validation loss = 0.6890639066696167
Validation loss = 0.5774861574172974
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 1.279271125793457
Validation loss = 0.5349101424217224
Validation loss = 0.65415358543396
Validation loss = 0.662717342376709
Validation loss = 0.5099684000015259
Validation loss = 0.45625293254852295
Validation loss = 0.5153698325157166
Validation loss = 0.4545561969280243
Validation loss = 0.5197704434394836
Validation loss = 0.46287471055984497
Validation loss = 0.504002571105957
Validation loss = 0.47774335741996765
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 1.1215511560440063
Validation loss = 0.4935111403465271
Validation loss = 0.6871711015701294
Validation loss = 0.573444128036499
Validation loss = 0.5373643636703491
Validation loss = 0.4574386775493622
Validation loss = 0.5793215036392212
Validation loss = 0.5947904586791992
Validation loss = 0.5419231057167053
Validation loss = 0.6195697784423828
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 1.7440489530563354
Validation loss = 0.5030611753463745
Validation loss = 0.727885365486145
Validation loss = 0.6766098141670227
Validation loss = 0.5691550374031067
Validation loss = 0.46304863691329956
Validation loss = 0.5437696576118469
Validation loss = 0.5637370944023132
Validation loss = 0.5914562940597534
Validation loss = 0.6495893001556396
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 95
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 91
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 129
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 92
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 92
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 195
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -400     |
| Iteration     | 0        |
| MaximumReturn | -342     |
| MinimumReturn | -446     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16089218854904175
Validation loss = 0.13029520213603973
Validation loss = 0.12424559891223907
Validation loss = 0.12513718008995056
Validation loss = 0.13223451375961304
Validation loss = 0.12066847085952759
Validation loss = 0.12848228216171265
Validation loss = 0.1320427507162094
Validation loss = 0.12345580011606216
Validation loss = 0.1292635202407837
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17128430306911469
Validation loss = 0.12947618961334229
Validation loss = 0.13027414679527283
Validation loss = 0.12998923659324646
Validation loss = 0.1264599710702896
Validation loss = 0.1638999879360199
Validation loss = 0.13541024923324585
Validation loss = 0.14506074786186218
Validation loss = 0.13698141276836395
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16276666522026062
Validation loss = 0.12711921334266663
Validation loss = 0.1340147703886032
Validation loss = 0.12103818356990814
Validation loss = 0.15159544348716736
Validation loss = 0.12583577632904053
Validation loss = 0.11720682680606842
Validation loss = 0.11892688274383545
Validation loss = 0.13181674480438232
Validation loss = 0.12901915609836578
Validation loss = 0.12322697788476944
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1639576554298401
Validation loss = 0.1303090751171112
Validation loss = 0.1276232898235321
Validation loss = 0.12267585843801498
Validation loss = 0.1278151273727417
Validation loss = 0.1309092938899994
Validation loss = 0.12906518578529358
Validation loss = 0.13522829115390778
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.16582342982292175
Validation loss = 0.13231906294822693
Validation loss = 0.14482153952121735
Validation loss = 0.13672947883605957
Validation loss = 0.13257059454917908
Validation loss = 0.1380200982093811
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 404
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 530
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 399
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 425
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 406
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 508
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -311     |
| Iteration     | 1        |
| MaximumReturn | -224     |
| MinimumReturn | -421     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09071429818868637
Validation loss = 0.08404126763343811
Validation loss = 0.09051832556724548
Validation loss = 0.08211231976747513
Validation loss = 0.08145763725042343
Validation loss = 0.07596466690301895
Validation loss = 0.08043364435434341
Validation loss = 0.08876544237136841
Validation loss = 0.08257870376110077
Validation loss = 0.0852983370423317
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09628365188837051
Validation loss = 0.09157606959342957
Validation loss = 0.09310990571975708
Validation loss = 0.0940394401550293
Validation loss = 0.08739224076271057
Validation loss = 0.08935072273015976
Validation loss = 0.08469017595052719
Validation loss = 0.0832657739520073
Validation loss = 0.08074819296598434
Validation loss = 0.0868898332118988
Validation loss = 0.08444756269454956
Validation loss = 0.10586488991975784
Validation loss = 0.0892966017127037
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08840811252593994
Validation loss = 0.08747079968452454
Validation loss = 0.08438853174448013
Validation loss = 0.0874413251876831
Validation loss = 0.08093741536140442
Validation loss = 0.08275619894266129
Validation loss = 0.07995930314064026
Validation loss = 0.08724502474069595
Validation loss = 0.08092988282442093
Validation loss = 0.07942856103181839
Validation loss = 0.0810246542096138
Validation loss = 0.08722182363271713
Validation loss = 0.08357924222946167
Validation loss = 0.08677417784929276
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0884772315621376
Validation loss = 0.08511161804199219
Validation loss = 0.08236254006624222
Validation loss = 0.08423122763633728
Validation loss = 0.08171374350786209
Validation loss = 0.09124928712844849
Validation loss = 0.08594480901956558
Validation loss = 0.08209270238876343
Validation loss = 0.08345886319875717
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0905214324593544
Validation loss = 0.09111440926790237
Validation loss = 0.08361759036779404
Validation loss = 0.08870279788970947
Validation loss = 0.0804939791560173
Validation loss = 0.08765629678964615
Validation loss = 0.08221571892499924
Validation loss = 0.0827879086136818
Validation loss = 0.08033143728971481
Validation loss = 0.07970794290304184
Validation loss = 0.08852509409189224
Validation loss = 0.0874762311577797
Validation loss = 0.07732997089624405
Validation loss = 0.08783850818872452
Validation loss = 0.08485543727874756
Validation loss = 0.07991170138120651
Validation loss = 0.08043702691793442
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 556
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 467
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 486
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 603
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 579
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 578
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -222     |
| Iteration     | 2        |
| MaximumReturn | -112     |
| MinimumReturn | -384     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08208562433719635
Validation loss = 0.07576273381710052
Validation loss = 0.07170986384153366
Validation loss = 0.07258093357086182
Validation loss = 0.0788850486278534
Validation loss = 0.07091514766216278
Validation loss = 0.07306617498397827
Validation loss = 0.07151003181934357
Validation loss = 0.07132934033870697
Validation loss = 0.0812719315290451
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08144833147525787
Validation loss = 0.07788293808698654
Validation loss = 0.07593315839767456
Validation loss = 0.07421344518661499
Validation loss = 0.07383249700069427
Validation loss = 0.07725508511066437
Validation loss = 0.07473082840442657
Validation loss = 0.07817330956459045
Validation loss = 0.07893066853284836
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08117228746414185
Validation loss = 0.07706352323293686
Validation loss = 0.0764605849981308
Validation loss = 0.07924436032772064
Validation loss = 0.07540195435285568
Validation loss = 0.07693666219711304
Validation loss = 0.07045359909534454
Validation loss = 0.07743451744318008
Validation loss = 0.08953239023685455
Validation loss = 0.08131766319274902
Validation loss = 0.07407514750957489
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08153367042541504
Validation loss = 0.07735569775104523
Validation loss = 0.07580989599227905
Validation loss = 0.07479748874902725
Validation loss = 0.07172007113695145
Validation loss = 0.07790550589561462
Validation loss = 0.07455641031265259
Validation loss = 0.07647507637739182
Validation loss = 0.07712621241807938
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08155209571123123
Validation loss = 0.07397635281085968
Validation loss = 0.07998929917812347
Validation loss = 0.07308168709278107
Validation loss = 0.0731172040104866
Validation loss = 0.07460486143827438
Validation loss = 0.075349822640419
Validation loss = 0.0727013424038887
Validation loss = 0.0733572244644165
Validation loss = 0.071024090051651
Validation loss = 0.07195097208023071
Validation loss = 0.07522449642419815
Validation loss = 0.07190326601266861
Validation loss = 0.07596668601036072
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 526
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 527
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 568
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 539
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 523
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 548
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -351     |
| Iteration     | 3        |
| MaximumReturn | -188     |
| MinimumReturn | -442     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07199623435735703
Validation loss = 0.06645916402339935
Validation loss = 0.0689428299665451
Validation loss = 0.07076642662286758
Validation loss = 0.0670720785856247
Validation loss = 0.06476138532161713
Validation loss = 0.06555880606174469
Validation loss = 0.0658102035522461
Validation loss = 0.06568188965320587
Validation loss = 0.06798034906387329
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0770321637392044
Validation loss = 0.07253582775592804
Validation loss = 0.0726407915353775
Validation loss = 0.06824516505002975
Validation loss = 0.06972804665565491
Validation loss = 0.07129975408315659
Validation loss = 0.06870219111442566
Validation loss = 0.06835205852985382
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0761471539735794
Validation loss = 0.07026837766170502
Validation loss = 0.07057353109121323
Validation loss = 0.07107780128717422
Validation loss = 0.06593342125415802
Validation loss = 0.06786062568426132
Validation loss = 0.06879891455173492
Validation loss = 0.06739301979541779
Validation loss = 0.06956265866756439
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07952731847763062
Validation loss = 0.07399506866931915
Validation loss = 0.06829395145177841
Validation loss = 0.0684482753276825
Validation loss = 0.07386176288127899
Validation loss = 0.06659239530563354
Validation loss = 0.0673844963312149
Validation loss = 0.07125313580036163
Validation loss = 0.06908774375915527
Validation loss = 0.07285526394844055
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07663051784038544
Validation loss = 0.06731394678354263
Validation loss = 0.06858140975236893
Validation loss = 0.06948237121105194
Validation loss = 0.06927759945392609
Validation loss = 0.06628429889678955
Validation loss = 0.06688319146633148
Validation loss = 0.06661292165517807
Validation loss = 0.06465746462345123
Validation loss = 0.06764520704746246
Validation loss = 0.06893487274646759
Validation loss = 0.06537415832281113
Validation loss = 0.06733827292919159
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 549
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 593
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 615
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 585
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 501
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 512
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -70.9    |
| Iteration     | 4        |
| MaximumReturn | 422      |
| MinimumReturn | -438     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06632030755281448
Validation loss = 0.061775028705596924
Validation loss = 0.06549408286809921
Validation loss = 0.060829952359199524
Validation loss = 0.059331852942705154
Validation loss = 0.05947944521903992
Validation loss = 0.06062275171279907
Validation loss = 0.05863340571522713
Validation loss = 0.05967484042048454
Validation loss = 0.0596373975276947
Validation loss = 0.05962405726313591
Validation loss = 0.06234918534755707
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.067159004509449
Validation loss = 0.06439615041017532
Validation loss = 0.06521184742450714
Validation loss = 0.06391558796167374
Validation loss = 0.06172601878643036
Validation loss = 0.06405217200517654
Validation loss = 0.06240594759583473
Validation loss = 0.061662256717681885
Validation loss = 0.06259740144014359
Validation loss = 0.06243574991822243
Validation loss = 0.062272440642118454
Validation loss = 0.06295324116945267
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06604493409395218
Validation loss = 0.06425262242555618
Validation loss = 0.06438210606575012
Validation loss = 0.06451351195573807
Validation loss = 0.06524442881345749
Validation loss = 0.06593780964612961
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07086104899644852
Validation loss = 0.06327890604734421
Validation loss = 0.062351759523153305
Validation loss = 0.06340428441762924
Validation loss = 0.06153123453259468
Validation loss = 0.06371844559907913
Validation loss = 0.06333190947771072
Validation loss = 0.06150368973612785
Validation loss = 0.060598909854888916
Validation loss = 0.05997874215245247
Validation loss = 0.059957344084978104
Validation loss = 0.061388518661260605
Validation loss = 0.0619940310716629
Validation loss = 0.06136687099933624
Validation loss = 0.05964215099811554
Validation loss = 0.06179885193705559
Validation loss = 0.06227514520287514
Validation loss = 0.06556703895330429
Validation loss = 0.05843830108642578
Validation loss = 0.06431888788938522
Validation loss = 0.059220265597105026
Validation loss = 0.05946037545800209
Validation loss = 0.0592111200094223
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06771525740623474
Validation loss = 0.06470433622598648
Validation loss = 0.0609905980527401
Validation loss = 0.05852082744240761
Validation loss = 0.059466417878866196
Validation loss = 0.06321612745523453
Validation loss = 0.058744918555021286
Validation loss = 0.061494480818510056
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 523
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 576
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 491
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 544
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 540
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 554
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 474      |
| Iteration     | 5        |
| MaximumReturn | 1.05e+03 |
| MinimumReturn | 224      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06186860799789429
Validation loss = 0.0543522872030735
Validation loss = 0.05322575941681862
Validation loss = 0.05208151414990425
Validation loss = 0.05235631391406059
Validation loss = 0.055209238082170486
Validation loss = 0.052596401423215866
Validation loss = 0.05503026768565178
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06502383947372437
Validation loss = 0.05934884771704674
Validation loss = 0.05937382951378822
Validation loss = 0.05608918145298958
Validation loss = 0.056029848754405975
Validation loss = 0.056896936148405075
Validation loss = 0.05365794897079468
Validation loss = 0.05490575358271599
Validation loss = 0.054770328104496
Validation loss = 0.0524815209209919
Validation loss = 0.0533137246966362
Validation loss = 0.059010036289691925
Validation loss = 0.05567515641450882
Validation loss = 0.0534827783703804
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06321616470813751
Validation loss = 0.05571746453642845
Validation loss = 0.05512932687997818
Validation loss = 0.05416301265358925
Validation loss = 0.0564221516251564
Validation loss = 0.05522846058011055
Validation loss = 0.05402110144495964
Validation loss = 0.05445375666022301
Validation loss = 0.055774789303541183
Validation loss = 0.05357203260064125
Validation loss = 0.05354413390159607
Validation loss = 0.05641092732548714
Validation loss = 0.05580613389611244
Validation loss = 0.05354604497551918
Validation loss = 0.05796905606985092
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06168099120259285
Validation loss = 0.0535314716398716
Validation loss = 0.05192118138074875
Validation loss = 0.053468137979507446
Validation loss = 0.05400991439819336
Validation loss = 0.05185580253601074
Validation loss = 0.05199509486556053
Validation loss = 0.05571538209915161
Validation loss = 0.05341096594929695
Validation loss = 0.05280626192688942
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06260168552398682
Validation loss = 0.05356907844543457
Validation loss = 0.05562818795442581
Validation loss = 0.053477007895708084
Validation loss = 0.05566469579935074
Validation loss = 0.052443068474531174
Validation loss = 0.05233892425894737
Validation loss = 0.053771886974573135
Validation loss = 0.05414171144366264
Validation loss = 0.051801446825265884
Validation loss = 0.059368640184402466
Validation loss = 0.05164768546819687
Validation loss = 0.053604789078235626
Validation loss = 0.05512704700231552
Validation loss = 0.05222072824835777
Validation loss = 0.05261332169175148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 542
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 540
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 558
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 543
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 572
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 500
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 634      |
| Iteration     | 6        |
| MaximumReturn | 791      |
| MinimumReturn | 277      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.053956035524606705
Validation loss = 0.049471285194158554
Validation loss = 0.050355881452560425
Validation loss = 0.049227967858314514
Validation loss = 0.048294972628355026
Validation loss = 0.048013195395469666
Validation loss = 0.04718084633350372
Validation loss = 0.048631854355335236
Validation loss = 0.0466679185628891
Validation loss = 0.051831282675266266
Validation loss = 0.046907901763916016
Validation loss = 0.04547930508852005
Validation loss = 0.047633253037929535
Validation loss = 0.04562549293041229
Validation loss = 0.04668429121375084
Validation loss = 0.04720549285411835
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05744044482707977
Validation loss = 0.04918705299496651
Validation loss = 0.04862050339579582
Validation loss = 0.04859968274831772
Validation loss = 0.04949522390961647
Validation loss = 0.048250190913677216
Validation loss = 0.0516054704785347
Validation loss = 0.049551501870155334
Validation loss = 0.048484839498996735
Validation loss = 0.047485679388046265
Validation loss = 0.048405520617961884
Validation loss = 0.047730814665555954
Validation loss = 0.05023840442299843
Validation loss = 0.049505215138196945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05311170965433121
Validation loss = 0.050148893147706985
Validation loss = 0.049893077462911606
Validation loss = 0.04794467240571976
Validation loss = 0.05299530178308487
Validation loss = 0.04872240871191025
Validation loss = 0.04797641932964325
Validation loss = 0.050319328904151917
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.051504358649253845
Validation loss = 0.052175894379615784
Validation loss = 0.047884151339530945
Validation loss = 0.04906012490391731
Validation loss = 0.04893390089273453
Validation loss = 0.0479574091732502
Validation loss = 0.04853825271129608
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05507923662662506
Validation loss = 0.04775369539856911
Validation loss = 0.05105852708220482
Validation loss = 0.046890757977962494
Validation loss = 0.05028042942285538
Validation loss = 0.047834791243076324
Validation loss = 0.046655572950839996
Validation loss = 0.04965002089738846
Validation loss = 0.0473235547542572
Validation loss = 0.04606851562857628
Validation loss = 0.04794620722532272
Validation loss = 0.04770253598690033
Validation loss = 0.04899962246417999
Validation loss = 0.04781274124979973
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 611
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 578
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 597
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 572
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 565
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 572
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.54e+03 |
| Iteration     | 7        |
| MaximumReturn | 1.92e+03 |
| MinimumReturn | 1.02e+03 |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04755581542849541
Validation loss = 0.04203387722373009
Validation loss = 0.04333885759115219
Validation loss = 0.04215800389647484
Validation loss = 0.044031497091054916
Validation loss = 0.04160122945904732
Validation loss = 0.04256565496325493
Validation loss = 0.04268546402454376
Validation loss = 0.04247288405895233
Validation loss = 0.043305132538080215
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.048139236867427826
Validation loss = 0.04495648667216301
Validation loss = 0.045532722026109695
Validation loss = 0.044161926954984665
Validation loss = 0.04538815841078758
Validation loss = 0.042176470160484314
Validation loss = 0.04464146867394447
Validation loss = 0.04329698905348778
Validation loss = 0.04478682205080986
Validation loss = 0.04414645582437515
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04927154630422592
Validation loss = 0.044914666563272476
Validation loss = 0.045351527631282806
Validation loss = 0.04470888152718544
Validation loss = 0.04329792037606239
Validation loss = 0.044520460069179535
Validation loss = 0.044399455189704895
Validation loss = 0.04475100338459015
Validation loss = 0.04386312887072563
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.051041144877672195
Validation loss = 0.042965374886989594
Validation loss = 0.04409138858318329
Validation loss = 0.04710880666971207
Validation loss = 0.04523906856775284
Validation loss = 0.041713450103998184
Validation loss = 0.043191585689783096
Validation loss = 0.04258503019809723
Validation loss = 0.04443188011646271
Validation loss = 0.04297797754406929
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04637938365340233
Validation loss = 0.04276391863822937
Validation loss = 0.04393671452999115
Validation loss = 0.04293963313102722
Validation loss = 0.042850229889154434
Validation loss = 0.044114626944065094
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 645
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 622
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 633
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 640
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 681
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 663
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.7e+03  |
| Iteration     | 8        |
| MaximumReturn | 2.11e+03 |
| MinimumReturn | 1.03e+03 |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04216575250029564
Validation loss = 0.040194071829319
Validation loss = 0.04044567421078682
Validation loss = 0.03926857188344002
Validation loss = 0.04085340350866318
Validation loss = 0.03651485592126846
Validation loss = 0.03939533233642578
Validation loss = 0.03820200636982918
Validation loss = 0.0380585752427578
Validation loss = 0.037279270589351654
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04560226574540138
Validation loss = 0.04010692238807678
Validation loss = 0.03972141817212105
Validation loss = 0.03891771286725998
Validation loss = 0.03916715458035469
Validation loss = 0.0401323065161705
Validation loss = 0.04333944991230965
Validation loss = 0.03851683437824249
Validation loss = 0.039422407746315
Validation loss = 0.03852010890841484
Validation loss = 0.038380108773708344
Validation loss = 0.03885437175631523
Validation loss = 0.03808402642607689
Validation loss = 0.03838067501783371
Validation loss = 0.039643414318561554
Validation loss = 0.03878074511885643
Validation loss = 0.04089600592851639
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04650639742612839
Validation loss = 0.040612876415252686
Validation loss = 0.040747977793216705
Validation loss = 0.04021886736154556
Validation loss = 0.04038123041391373
Validation loss = 0.03994379937648773
Validation loss = 0.039302993565797806
Validation loss = 0.04068310558795929
Validation loss = 0.04140612855553627
Validation loss = 0.03928043320775032
Validation loss = 0.03886353224515915
Validation loss = 0.03999172896146774
Validation loss = 0.0385357066988945
Validation loss = 0.03959174454212189
Validation loss = 0.03901571035385132
Validation loss = 0.037831611931324005
Validation loss = 0.03972660005092621
Validation loss = 0.0394686684012413
Validation loss = 0.04007766395807266
Validation loss = 0.03972252458333969
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04456306993961334
Validation loss = 0.040192410349845886
Validation loss = 0.04111173003911972
Validation loss = 0.04102706164121628
Validation loss = 0.03980110213160515
Validation loss = 0.03975094109773636
Validation loss = 0.04096630960702896
Validation loss = 0.04038212448358536
Validation loss = 0.04120655357837677
Validation loss = 0.04231451079249382
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.042701490223407745
Validation loss = 0.039562810212373734
Validation loss = 0.0398608073592186
Validation loss = 0.04008346050977707
Validation loss = 0.038376931101083755
Validation loss = 0.03838197886943817
Validation loss = 0.03878104314208031
Validation loss = 0.0420636385679245
Validation loss = 0.03776448965072632
Validation loss = 0.037198834121227264
Validation loss = 0.039871908724308014
Validation loss = 0.03776752948760986
Validation loss = 0.04084068909287453
Validation loss = 0.04033486172556877
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 709
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 781
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 784
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 656
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 766
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 735
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 908      |
| Iteration     | 9        |
| MaximumReturn | 2.59e+03 |
| MinimumReturn | -142     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03975946083664894
Validation loss = 0.03487378731369972
Validation loss = 0.0360996387898922
Validation loss = 0.03603304922580719
Validation loss = 0.041806671768426895
Validation loss = 0.035933081060647964
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04274323210120201
Validation loss = 0.0364227257668972
Validation loss = 0.036616161465644836
Validation loss = 0.03886480629444122
Validation loss = 0.03678039088845253
Validation loss = 0.03634628280997276
Validation loss = 0.03742510452866554
Validation loss = 0.03502765670418739
Validation loss = 0.03611811622977257
Validation loss = 0.03619728982448578
Validation loss = 0.03467193990945816
Validation loss = 0.036714013665914536
Validation loss = 0.03576177731156349
Validation loss = 0.035822417587041855
Validation loss = 0.03547029197216034
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0415043979883194
Validation loss = 0.03592614829540253
Validation loss = 0.04094319045543671
Validation loss = 0.036645952612161636
Validation loss = 0.0367463193833828
Validation loss = 0.036604296416044235
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0437934473156929
Validation loss = 0.038847845047712326
Validation loss = 0.03694263473153114
Validation loss = 0.036419887095689774
Validation loss = 0.036695320159196854
Validation loss = 0.036795392632484436
Validation loss = 0.03741208463907242
Validation loss = 0.037002645432949066
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04188353195786476
Validation loss = 0.036204081028699875
Validation loss = 0.03616539016366005
Validation loss = 0.03678638115525246
Validation loss = 0.03718120977282524
Validation loss = 0.037395864725112915
Validation loss = 0.037448540329933167
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 665
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 701
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 718
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 681
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 686
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 734
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.1e+03  |
| Iteration     | 10       |
| MaximumReturn | 3.09e+03 |
| MinimumReturn | 984      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03627367317676544
Validation loss = 0.0338311530649662
Validation loss = 0.034494902938604355
Validation loss = 0.03336925804615021
Validation loss = 0.032321516424417496
Validation loss = 0.032753556966781616
Validation loss = 0.03453151881694794
Validation loss = 0.03351965919137001
Validation loss = 0.033040162175893784
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0385938435792923
Validation loss = 0.03365067020058632
Validation loss = 0.03369849920272827
Validation loss = 0.03451281413435936
Validation loss = 0.03380420431494713
Validation loss = 0.03307672590017319
Validation loss = 0.033713094890117645
Validation loss = 0.03364856541156769
Validation loss = 0.03377601504325867
Validation loss = 0.03220536932349205
Validation loss = 0.03335988521575928
Validation loss = 0.03416789695620537
Validation loss = 0.03328036144375801
Validation loss = 0.03453001379966736
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03699898719787598
Validation loss = 0.03484268859028816
Validation loss = 0.03395504876971245
Validation loss = 0.03436727449297905
Validation loss = 0.03207390382885933
Validation loss = 0.034158725291490555
Validation loss = 0.03351745381951332
Validation loss = 0.03385383263230324
Validation loss = 0.03237506002187729
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.037902891635894775
Validation loss = 0.034298643469810486
Validation loss = 0.03442109748721123
Validation loss = 0.03500789776444435
Validation loss = 0.035139769315719604
Validation loss = 0.03399666026234627
Validation loss = 0.03400631621479988
Validation loss = 0.03353424742817879
Validation loss = 0.03394869342446327
Validation loss = 0.0321803092956543
Validation loss = 0.03348695859313011
Validation loss = 0.03321399167180061
Validation loss = 0.03344845771789551
Validation loss = 0.03245975077152252
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.037880413234233856
Validation loss = 0.0349271260201931
Validation loss = 0.03401289880275726
Validation loss = 0.03417303413152695
Validation loss = 0.03528851643204689
Validation loss = 0.03311341628432274
Validation loss = 0.03347443416714668
Validation loss = 0.03277863189578056
Validation loss = 0.032212287187576294
Validation loss = 0.032416295260190964
Validation loss = 0.033270347863435745
Validation loss = 0.03419937938451767
Validation loss = 0.033267706632614136
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 698
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 698
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 699
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 700
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 717
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 699
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.7e+03  |
| Iteration     | 11       |
| MaximumReturn | 3.45e+03 |
| MinimumReturn | 1.45e+03 |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03620672598481178
Validation loss = 0.031110988929867744
Validation loss = 0.029868055135011673
Validation loss = 0.030527208000421524
Validation loss = 0.034016113728284836
Validation loss = 0.03274331986904144
Validation loss = 0.029880346730351448
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03557862713932991
Validation loss = 0.0313844308257103
Validation loss = 0.032297078520059586
Validation loss = 0.030788686126470566
Validation loss = 0.03049403615295887
Validation loss = 0.03233717754483223
Validation loss = 0.029437674209475517
Validation loss = 0.032113734632730484
Validation loss = 0.03089599870145321
Validation loss = 0.02901839092373848
Validation loss = 0.028766708448529243
Validation loss = 0.030333667993545532
Validation loss = 0.030300311744213104
Validation loss = 0.030329782515764236
Validation loss = 0.029914943501353264
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03971155732870102
Validation loss = 0.031084470450878143
Validation loss = 0.03181655332446098
Validation loss = 0.03120807185769081
Validation loss = 0.03061625175178051
Validation loss = 0.03182227164506912
Validation loss = 0.03174273669719696
Validation loss = 0.029691698029637337
Validation loss = 0.030276913195848465
Validation loss = 0.031790442764759064
Validation loss = 0.03049144335091114
Validation loss = 0.03248319774866104
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03466283902525902
Validation loss = 0.03029770962893963
Validation loss = 0.030243245884776115
Validation loss = 0.03211910277605057
Validation loss = 0.030278772115707397
Validation loss = 0.03060886077582836
Validation loss = 0.03103123977780342
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.032951053231954575
Validation loss = 0.030677752569317818
Validation loss = 0.03230907768011093
Validation loss = 0.0298843365162611
Validation loss = 0.0301055870950222
Validation loss = 0.02979256957769394
Validation loss = 0.032076675444841385
Validation loss = 0.0307240579277277
Validation loss = 0.031014425680041313
Validation loss = 0.032410647720098495
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 734
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 737
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 740
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 707
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 719
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 739
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.18e+03 |
| Iteration     | 12       |
| MaximumReturn | 3e+03    |
| MinimumReturn | 882      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.030526267364621162
Validation loss = 0.029244577512145042
Validation loss = 0.02820221520960331
Validation loss = 0.029495077207684517
Validation loss = 0.02970438078045845
Validation loss = 0.029029859229922295
Validation loss = 0.027855055406689644
Validation loss = 0.029647452756762505
Validation loss = 0.028033757582306862
Validation loss = 0.028668981045484543
Validation loss = 0.027635587379336357
Validation loss = 0.027995005249977112
Validation loss = 0.02762911096215248
Validation loss = 0.028831470757722855
Validation loss = 0.027504760771989822
Validation loss = 0.028809892013669014
Validation loss = 0.028542164713144302
Validation loss = 0.027513325214385986
Validation loss = 0.029287496581673622
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03734152764081955
Validation loss = 0.027844903990626335
Validation loss = 0.02959113009274006
Validation loss = 0.027605965733528137
Validation loss = 0.027598384767770767
Validation loss = 0.028560781851410866
Validation loss = 0.027335774153470993
Validation loss = 0.030209463089704514
Validation loss = 0.02752552554011345
Validation loss = 0.0293560940772295
Validation loss = 0.027055585756897926
Validation loss = 0.027952397242188454
Validation loss = 0.028014538809657097
Validation loss = 0.027786048129200935
Validation loss = 0.026916829869151115
Validation loss = 0.027112901210784912
Validation loss = 0.028358126059174538
Validation loss = 0.02692934311926365
Validation loss = 0.027027811855077744
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03193512186408043
Validation loss = 0.02917841635644436
Validation loss = 0.03179143741726875
Validation loss = 0.028240231797099113
Validation loss = 0.027980929240584373
Validation loss = 0.02883211150765419
Validation loss = 0.030588988214731216
Validation loss = 0.0279401745647192
Validation loss = 0.02968459017574787
Validation loss = 0.028584081679582596
Validation loss = 0.02844061329960823
Validation loss = 0.028419580310583115
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03166770562529564
Validation loss = 0.028840774670243263
Validation loss = 0.028427833691239357
Validation loss = 0.029262298718094826
Validation loss = 0.02936769463121891
Validation loss = 0.027196625247597694
Validation loss = 0.028807254508137703
Validation loss = 0.028256235644221306
Validation loss = 0.030044738203287125
Validation loss = 0.02764178439974785
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03309787064790726
Validation loss = 0.029150884598493576
Validation loss = 0.02824072167277336
Validation loss = 0.029249925166368484
Validation loss = 0.028106940910220146
Validation loss = 0.02984575368463993
Validation loss = 0.02751217596232891
Validation loss = 0.02816823311150074
Validation loss = 0.027145838364958763
Validation loss = 0.028430145233869553
Validation loss = 0.027803361415863037
Validation loss = 0.027049796655774117
Validation loss = 0.028862101957201958
Validation loss = 0.027161285281181335
Validation loss = 0.028557702898979187
Validation loss = 0.02770817279815674
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 724
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 707
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 719
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 715
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 672
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 708
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.95e+03 |
| Iteration     | 13       |
| MaximumReturn | 3.31e+03 |
| MinimumReturn | -291     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03873882070183754
Validation loss = 0.03696514666080475
Validation loss = 0.037048302590847015
Validation loss = 0.03843836858868599
Validation loss = 0.037647854536771774
Validation loss = 0.03661274164915085
Validation loss = 0.03682048246264458
Validation loss = 0.037115130573511124
Validation loss = 0.03651728108525276
Validation loss = 0.03741813451051712
Validation loss = 0.03778191655874252
Validation loss = 0.038480643182992935
Validation loss = 0.03746958449482918
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04105240851640701
Validation loss = 0.037948764860630035
Validation loss = 0.03896426409482956
Validation loss = 0.038202881813049316
Validation loss = 0.0368979349732399
Validation loss = 0.03834594786167145
Validation loss = 0.04183269292116165
Validation loss = 0.03928806632757187
Validation loss = 0.04059677571058273
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.042505379766225815
Validation loss = 0.03900061175227165
Validation loss = 0.03980257362127304
Validation loss = 0.03949017822742462
Validation loss = 0.04061470553278923
Validation loss = 0.038585733622312546
Validation loss = 0.04128085821866989
Validation loss = 0.03904436156153679
Validation loss = 0.040526486933231354
Validation loss = 0.037855856120586395
Validation loss = 0.041839562356472015
Validation loss = 0.03964842110872269
Validation loss = 0.039196696132421494
Validation loss = 0.04082590714097023
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04178181663155556
Validation loss = 0.03978103771805763
Validation loss = 0.04002775251865387
Validation loss = 0.041005924344062805
Validation loss = 0.04197854548692703
Validation loss = 0.039835914969444275
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.043395720422267914
Validation loss = 0.03527534008026123
Validation loss = 0.03578455373644829
Validation loss = 0.036066897213459015
Validation loss = 0.03713168948888779
Validation loss = 0.03689932823181152
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 701
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 714
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 709
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 732
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 717
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 721
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.73e+03 |
| Iteration     | 14       |
| MaximumReturn | 3.67e+03 |
| MinimumReturn | -115     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03568431735038757
Validation loss = 0.03564392402768135
Validation loss = 0.03567524254322052
Validation loss = 0.03873476758599281
Validation loss = 0.03655476123094559
Validation loss = 0.035623304545879364
Validation loss = 0.03549949824810028
Validation loss = 0.034874849021434784
Validation loss = 0.03655024617910385
Validation loss = 0.0376286581158638
Validation loss = 0.036252185702323914
Validation loss = 0.03653683513402939
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.040484897792339325
Validation loss = 0.03837022930383682
Validation loss = 0.03786621615290642
Validation loss = 0.03876693919301033
Validation loss = 0.03890838474035263
Validation loss = 0.03841331601142883
Validation loss = 0.04163191094994545
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.041563745588064194
Validation loss = 0.038358137011528015
Validation loss = 0.039631832391023636
Validation loss = 0.04049668461084366
Validation loss = 0.03997349366545677
Validation loss = 0.040734268724918365
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0370873287320137
Validation loss = 0.03772968798875809
Validation loss = 0.039267122745513916
Validation loss = 0.038771096616983414
Validation loss = 0.03740852326154709
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03707630932331085
Validation loss = 0.03639841079711914
Validation loss = 0.033964067697525024
Validation loss = 0.034957028925418854
Validation loss = 0.03571408614516258
Validation loss = 0.03553520143032074
Validation loss = 0.03551698848605156
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 770
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 737
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 764
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 748
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 743
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 720
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.81e+03 |
| Iteration     | 15       |
| MaximumReturn | 3.71e+03 |
| MinimumReturn | 144      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03730184957385063
Validation loss = 0.03470491245388985
Validation loss = 0.03388995677232742
Validation loss = 0.036380138248205185
Validation loss = 0.035713981837034225
Validation loss = 0.03588009998202324
Validation loss = 0.03539549559354782
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.036501359194517136
Validation loss = 0.03449224308133125
Validation loss = 0.038642629981040955
Validation loss = 0.03983290120959282
Validation loss = 0.03935303911566734
Validation loss = 0.03904829919338226
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04112067073583603
Validation loss = 0.0411754846572876
Validation loss = 0.04061606153845787
Validation loss = 0.039465516805648804
Validation loss = 0.04089007526636124
Validation loss = 0.04205426201224327
Validation loss = 0.04106751084327698
Validation loss = 0.04131580516695976
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04006454348564148
Validation loss = 0.03744079917669296
Validation loss = 0.038003213703632355
Validation loss = 0.03963824361562729
Validation loss = 0.03719427436590195
Validation loss = 0.039743825793266296
Validation loss = 0.03972756490111351
Validation loss = 0.04051832854747772
Validation loss = 0.03900758549571037
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03555940464138985
Validation loss = 0.033907465636730194
Validation loss = 0.03402012586593628
Validation loss = 0.03266970068216324
Validation loss = 0.03499345853924751
Validation loss = 0.0350111648440361
Validation loss = 0.03396369889378548
Validation loss = 0.03361661732196808
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 565
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 732
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 738
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 750
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 743
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 732
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.41e+03 |
| Iteration     | 16       |
| MaximumReturn | 3.55e+03 |
| MinimumReturn | -61.3    |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029388122260570526
Validation loss = 0.02381621114909649
Validation loss = 0.02586294896900654
Validation loss = 0.024666499346494675
Validation loss = 0.023833582177758217
Validation loss = 0.024475103244185448
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.030358798801898956
Validation loss = 0.02388712950050831
Validation loss = 0.025899117812514305
Validation loss = 0.023914584890007973
Validation loss = 0.024757884442806244
Validation loss = 0.024637557566165924
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03156549856066704
Validation loss = 0.024694016203284264
Validation loss = 0.025117624551057816
Validation loss = 0.025142105296254158
Validation loss = 0.02478303387761116
Validation loss = 0.024119723588228226
Validation loss = 0.024597205221652985
Validation loss = 0.024812942370772362
Validation loss = 0.024116434156894684
Validation loss = 0.024164723232388496
Validation loss = 0.025302037596702576
Validation loss = 0.022856706753373146
Validation loss = 0.02509426698088646
Validation loss = 0.0246743094176054
Validation loss = 0.023366237059235573
Validation loss = 0.023334627971053123
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030541691929101944
Validation loss = 0.025805847719311714
Validation loss = 0.026376983150839806
Validation loss = 0.02634540945291519
Validation loss = 0.025541117414832115
Validation loss = 0.025943918153643608
Validation loss = 0.024543074890971184
Validation loss = 0.0249466672539711
Validation loss = 0.025404397398233414
Validation loss = 0.02431296370923519
Validation loss = 0.024100128561258316
Validation loss = 0.024893082678318024
Validation loss = 0.023609211668372154
Validation loss = 0.024007480591535568
Validation loss = 0.024881042540073395
Validation loss = 0.02404545061290264
Validation loss = 0.024240313097834587
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029380930587649345
Validation loss = 0.025809524580836296
Validation loss = 0.025247398763895035
Validation loss = 0.02500215359032154
Validation loss = 0.025090783834457397
Validation loss = 0.024634679779410362
Validation loss = 0.025492213666439056
Validation loss = 0.024064427241683006
Validation loss = 0.024212146177887917
Validation loss = 0.024238919839262962
Validation loss = 0.02401544898748398
Validation loss = 0.025284292176365852
Validation loss = 0.02479816973209381
Validation loss = 0.024114781990647316
Validation loss = 0.023598860949277878
Validation loss = 0.024918586015701294
Validation loss = 0.023609746247529984
Validation loss = 0.026091957464814186
Validation loss = 0.02374090626835823
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 722
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 737
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 746
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 735
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 743
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 750
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.82e+03 |
| Iteration     | 17       |
| MaximumReturn | 3.62e+03 |
| MinimumReturn | 1.4e+03  |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02459052763879299
Validation loss = 0.02316497266292572
Validation loss = 0.02279495634138584
Validation loss = 0.024456797167658806
Validation loss = 0.023031774908304214
Validation loss = 0.022612439468503
Validation loss = 0.02279174141585827
Validation loss = 0.023960817605257034
Validation loss = 0.02168148197233677
Validation loss = 0.02334708906710148
Validation loss = 0.02250000834465027
Validation loss = 0.021476125344634056
Validation loss = 0.023108694702386856
Validation loss = 0.022768685594201088
Validation loss = 0.021546537056565285
Validation loss = 0.022536704316735268
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02704041264951229
Validation loss = 0.02301081269979477
Validation loss = 0.022795503959059715
Validation loss = 0.023648377507925034
Validation loss = 0.02333688922226429
Validation loss = 0.023864025250077248
Validation loss = 0.025575801730155945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025865647941827774
Validation loss = 0.022921299561858177
Validation loss = 0.023154618218541145
Validation loss = 0.022117773070931435
Validation loss = 0.023166175931692123
Validation loss = 0.02211735025048256
Validation loss = 0.02228766866028309
Validation loss = 0.022105293348431587
Validation loss = 0.0218797717243433
Validation loss = 0.022482959553599358
Validation loss = 0.023134885355830193
Validation loss = 0.02210822142660618
Validation loss = 0.02275978773832321
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02474476769566536
Validation loss = 0.022885536774992943
Validation loss = 0.022556114941835403
Validation loss = 0.02255883254110813
Validation loss = 0.023389888927340508
Validation loss = 0.02264515496790409
Validation loss = 0.022349132224917412
Validation loss = 0.02236173115670681
Validation loss = 0.02476622350513935
Validation loss = 0.022778483107686043
Validation loss = 0.022658096626400948
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025214582681655884
Validation loss = 0.02312697470188141
Validation loss = 0.023025864735245705
Validation loss = 0.02238013595342636
Validation loss = 0.02397906221449375
Validation loss = 0.023073600605130196
Validation loss = 0.0224983561784029
Validation loss = 0.02344137616455555
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 740
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 756
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 750
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 751
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 724
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 735
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.3e+03  |
| Iteration     | 18       |
| MaximumReturn | 3.8e+03  |
| MinimumReturn | 2.6e+03  |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023920882493257523
Validation loss = 0.021729949861764908
Validation loss = 0.022092897444963455
Validation loss = 0.0215560682117939
Validation loss = 0.021322134882211685
Validation loss = 0.02165280096232891
Validation loss = 0.02107519842684269
Validation loss = 0.02189037576317787
Validation loss = 0.020865164697170258
Validation loss = 0.022094089537858963
Validation loss = 0.021607771515846252
Validation loss = 0.020861156284809113
Validation loss = 0.021952752023935318
Validation loss = 0.021434953436255455
Validation loss = 0.02106301113963127
Validation loss = 0.021214868873357773
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023654747754335403
Validation loss = 0.02258344367146492
Validation loss = 0.0220487117767334
Validation loss = 0.022681305184960365
Validation loss = 0.023463726043701172
Validation loss = 0.021927516907453537
Validation loss = 0.024884266778826714
Validation loss = 0.02226778119802475
Validation loss = 0.022828679531812668
Validation loss = 0.023238584399223328
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022256990894675255
Validation loss = 0.022037308663129807
Validation loss = 0.022434908896684647
Validation loss = 0.02078012004494667
Validation loss = 0.02124912664294243
Validation loss = 0.021337274461984634
Validation loss = 0.022040359675884247
Validation loss = 0.021552737802267075
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023596858605742455
Validation loss = 0.021709993481636047
Validation loss = 0.023116756230592728
Validation loss = 0.022051801905035973
Validation loss = 0.02200401946902275
Validation loss = 0.02204279974102974
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025402123108506203
Validation loss = 0.02124234102666378
Validation loss = 0.022483449429273605
Validation loss = 0.022289644926786423
Validation loss = 0.022116854786872864
Validation loss = 0.021370816975831985
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 784
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 794
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 764
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 781
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 758
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 798
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 792      |
| Iteration     | 19       |
| MaximumReturn | 2.91e+03 |
| MinimumReturn | 16.9     |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022375240921974182
Validation loss = 0.020504280924797058
Validation loss = 0.02244221605360508
Validation loss = 0.021175719797611237
Validation loss = 0.020711805671453476
Validation loss = 0.019876820966601372
Validation loss = 0.02071491628885269
Validation loss = 0.02054375410079956
Validation loss = 0.02043677493929863
Validation loss = 0.01909615658223629
Validation loss = 0.021116431802511215
Validation loss = 0.01949962042272091
Validation loss = 0.020341891795396805
Validation loss = 0.019720975309610367
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02412397600710392
Validation loss = 0.022000178694725037
Validation loss = 0.021391138434410095
Validation loss = 0.02092842012643814
Validation loss = 0.02158656157553196
Validation loss = 0.02137039043009281
Validation loss = 0.020739080384373665
Validation loss = 0.020236562937498093
Validation loss = 0.021001653745770454
Validation loss = 0.021344050765037537
Validation loss = 0.02128485217690468
Validation loss = 0.02060265839099884
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022402074187994003
Validation loss = 0.020255276933312416
Validation loss = 0.021796490997076035
Validation loss = 0.02038430981338024
Validation loss = 0.020275171846151352
Validation loss = 0.021949611604213715
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02470427379012108
Validation loss = 0.020900527015328407
Validation loss = 0.023038366809487343
Validation loss = 0.021427199244499207
Validation loss = 0.021352067589759827
Validation loss = 0.02159803919494152
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02291102148592472
Validation loss = 0.02124948985874653
Validation loss = 0.021915283054113388
Validation loss = 0.020899876952171326
Validation loss = 0.020321214571595192
Validation loss = 0.02164267934858799
Validation loss = 0.021179715171456337
Validation loss = 0.021090108901262283
Validation loss = 0.02114170230925083
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 794
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 800
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 795
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 774
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 795
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 800
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.95e+03 |
| Iteration     | 20       |
| MaximumReturn | 3.94e+03 |
| MinimumReturn | 17.1     |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02021963521838188
Validation loss = 0.020045612007379532
Validation loss = 0.019306674599647522
Validation loss = 0.019320255145430565
Validation loss = 0.019511831924319267
Validation loss = 0.01884733885526657
Validation loss = 0.019660435616970062
Validation loss = 0.020103849470615387
Validation loss = 0.019350310787558556
Validation loss = 0.018705127760767937
Validation loss = 0.020103754475712776
Validation loss = 0.019861726090312004
Validation loss = 0.019043998792767525
Validation loss = 0.019074685871601105
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020861627534031868
Validation loss = 0.020007861778140068
Validation loss = 0.020147213712334633
Validation loss = 0.020024918019771576
Validation loss = 0.020357368513941765
Validation loss = 0.019997118040919304
Validation loss = 0.019575025886297226
Validation loss = 0.020292004570364952
Validation loss = 0.01927543431520462
Validation loss = 0.02001972123980522
Validation loss = 0.01975349150598049
Validation loss = 0.019532866775989532
Validation loss = 0.020220639184117317
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020873425528407097
Validation loss = 0.019840087741613388
Validation loss = 0.020144278183579445
Validation loss = 0.020726747810840607
Validation loss = 0.018949173390865326
Validation loss = 0.020320119336247444
Validation loss = 0.019510729238390923
Validation loss = 0.019677691161632538
Validation loss = 0.019933568313717842
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022265810519456863
Validation loss = 0.020321127027273178
Validation loss = 0.020242460072040558
Validation loss = 0.020014401525259018
Validation loss = 0.02025557868182659
Validation loss = 0.01988864690065384
Validation loss = 0.02150842174887657
Validation loss = 0.020298266783356667
Validation loss = 0.02043255977332592
Validation loss = 0.020365720614790916
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022655615583062172
Validation loss = 0.02083350531756878
Validation loss = 0.021766556426882744
Validation loss = 0.020023740828037262
Validation loss = 0.02070911042392254
Validation loss = 0.019391944631934166
Validation loss = 0.020230615511536598
Validation loss = 0.021160149946808815
Validation loss = 0.020297152921557426
Validation loss = 0.02120538242161274
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 789
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 758
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 793
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 806
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 797
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 785
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.18e+03 |
| Iteration     | 21       |
| MaximumReturn | 3.66e+03 |
| MinimumReturn | 151      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018703369423747063
Validation loss = 0.0186692476272583
Validation loss = 0.019181616604328156
Validation loss = 0.018671896308660507
Validation loss = 0.018263529986143112
Validation loss = 0.018484652042388916
Validation loss = 0.017579328268766403
Validation loss = 0.018282335251569748
Validation loss = 0.019377047196030617
Validation loss = 0.017565308138728142
Validation loss = 0.019195368513464928
Validation loss = 0.017411883920431137
Validation loss = 0.018952321261167526
Validation loss = 0.018109967932105064
Validation loss = 0.018563881516456604
Validation loss = 0.018578317016363144
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01999288611114025
Validation loss = 0.01971910521388054
Validation loss = 0.019869118928909302
Validation loss = 0.019400279968976974
Validation loss = 0.019643468782305717
Validation loss = 0.019465984776616096
Validation loss = 0.019911108538508415
Validation loss = 0.0190883856266737
Validation loss = 0.019506843760609627
Validation loss = 0.019214298576116562
Validation loss = 0.018683236092329025
Validation loss = 0.01890663430094719
Validation loss = 0.019033782184123993
Validation loss = 0.018827015534043312
Validation loss = 0.019888747483491898
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.020137140527367592
Validation loss = 0.01907315105199814
Validation loss = 0.019765811040997505
Validation loss = 0.019780432805418968
Validation loss = 0.019191773608326912
Validation loss = 0.020183835178613663
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020454898476600647
Validation loss = 0.019964085891842842
Validation loss = 0.020118966698646545
Validation loss = 0.019815567880868912
Validation loss = 0.019028013572096825
Validation loss = 0.019167395308613777
Validation loss = 0.019270051270723343
Validation loss = 0.01923566684126854
Validation loss = 0.019942430779337883
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021095726639032364
Validation loss = 0.019833503291010857
Validation loss = 0.018902353942394257
Validation loss = 0.020409800112247467
Validation loss = 0.018789678812026978
Validation loss = 0.018751265481114388
Validation loss = 0.018733879551291466
Validation loss = 0.019879385828971863
Validation loss = 0.019096018746495247
Validation loss = 0.01946287788450718
Validation loss = 0.018204672262072563
Validation loss = 0.019446218386292458
Validation loss = 0.01873219758272171
Validation loss = 0.018437830731272697
Validation loss = 0.02059055119752884
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 785
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 791
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 784
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 772
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 785
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 778
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.92e+03 |
| Iteration     | 22       |
| MaximumReturn | 3.62e+03 |
| MinimumReturn | -225     |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019334658980369568
Validation loss = 0.018003325909376144
Validation loss = 0.017955830320715904
Validation loss = 0.018345629796385765
Validation loss = 0.018698619678616524
Validation loss = 0.018127402290701866
Validation loss = 0.017333438619971275
Validation loss = 0.017560921609401703
Validation loss = 0.017234062775969505
Validation loss = 0.01792941428720951
Validation loss = 0.017332112416625023
Validation loss = 0.017970124259591103
Validation loss = 0.01829076185822487
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019877757877111435
Validation loss = 0.018081732094287872
Validation loss = 0.01812809891998768
Validation loss = 0.01853502169251442
Validation loss = 0.018600907176733017
Validation loss = 0.01871838979423046
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019371235743165016
Validation loss = 0.018834974616765976
Validation loss = 0.019132256507873535
Validation loss = 0.01828526146709919
Validation loss = 0.018035167828202248
Validation loss = 0.01812729798257351
Validation loss = 0.019289830699563026
Validation loss = 0.018748698756098747
Validation loss = 0.018019728362560272
Validation loss = 0.018343957141041756
Validation loss = 0.01776363141834736
Validation loss = 0.01754765212535858
Validation loss = 0.018540984019637108
Validation loss = 0.01770358346402645
Validation loss = 0.017475806176662445
Validation loss = 0.018227167427539825
Validation loss = 0.018241215497255325
Validation loss = 0.016958825290203094
Validation loss = 0.018343942239880562
Validation loss = 0.01726224459707737
Validation loss = 0.018049463629722595
Validation loss = 0.018429165706038475
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01972789503633976
Validation loss = 0.018521785736083984
Validation loss = 0.019082484766840935
Validation loss = 0.01872902177274227
Validation loss = 0.019176386296749115
Validation loss = 0.01787070371210575
Validation loss = 0.020305560901761055
Validation loss = 0.01879032514989376
Validation loss = 0.017748527228832245
Validation loss = 0.018525704741477966
Validation loss = 0.017707621678709984
Validation loss = 0.018218345940113068
Validation loss = 0.017555641010403633
Validation loss = 0.019117554649710655
Validation loss = 0.01823539473116398
Validation loss = 0.018004491925239563
Validation loss = 0.018581010401248932
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0198427252471447
Validation loss = 0.018822425976395607
Validation loss = 0.01910792477428913
Validation loss = 0.01786157675087452
Validation loss = 0.01901111751794815
Validation loss = 0.017819195985794067
Validation loss = 0.018114928156137466
Validation loss = 0.017935268580913544
Validation loss = 0.0183166041970253
Validation loss = 0.018220461905002594
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 794
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 757
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 802
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 768
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 811
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 804
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.42e+03 |
| Iteration     | 23       |
| MaximumReturn | 4.02e+03 |
| MinimumReturn | -160     |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01720539480447769
Validation loss = 0.0164219718426466
Validation loss = 0.017393022775650024
Validation loss = 0.017251459881663322
Validation loss = 0.016308926045894623
Validation loss = 0.01635400392115116
Validation loss = 0.016697943210601807
Validation loss = 0.016751227900385857
Validation loss = 0.0172466728836298
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018182696774601936
Validation loss = 0.018031148239970207
Validation loss = 0.01768423803150654
Validation loss = 0.01757091097533703
Validation loss = 0.017229866236448288
Validation loss = 0.01787479594349861
Validation loss = 0.017377944663167
Validation loss = 0.017653336748480797
Validation loss = 0.01841501146554947
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016932552680373192
Validation loss = 0.016726026311516762
Validation loss = 0.016197141259908676
Validation loss = 0.017214808613061905
Validation loss = 0.017818348482251167
Validation loss = 0.01689218170940876
Validation loss = 0.01731465384364128
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01806812919676304
Validation loss = 0.017309676855802536
Validation loss = 0.017076415941119194
Validation loss = 0.018158912658691406
Validation loss = 0.017797723412513733
Validation loss = 0.017361603677272797
Validation loss = 0.017510931938886642
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019219063222408295
Validation loss = 0.017075615003705025
Validation loss = 0.017887093126773834
Validation loss = 0.017964432016015053
Validation loss = 0.01733626052737236
Validation loss = 0.017542896792292595
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 793
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 782
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 920
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 769
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 784
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 768
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.83e+03 |
| Iteration     | 24       |
| MaximumReturn | 3.87e+03 |
| MinimumReturn | -81.5    |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01731701008975506
Validation loss = 0.015477887354791164
Validation loss = 0.016391556710004807
Validation loss = 0.0161192137748003
Validation loss = 0.017970522865653038
Validation loss = 0.01550785731524229
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01835152506828308
Validation loss = 0.016586989164352417
Validation loss = 0.01693839021027088
Validation loss = 0.016338761895895004
Validation loss = 0.017074020579457283
Validation loss = 0.016544541344046593
Validation loss = 0.016475360840559006
Validation loss = 0.01676328107714653
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017939401790499687
Validation loss = 0.016469700261950493
Validation loss = 0.017133649438619614
Validation loss = 0.015901507809758186
Validation loss = 0.015889106318354607
Validation loss = 0.01614377833902836
Validation loss = 0.016885923221707344
Validation loss = 0.015856577083468437
Validation loss = 0.015255378559231758
Validation loss = 0.017588935792446136
Validation loss = 0.016079124063253403
Validation loss = 0.015605888329446316
Validation loss = 0.016178825870156288
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018691349774599075
Validation loss = 0.016450121998786926
Validation loss = 0.01657702401280403
Validation loss = 0.01676945574581623
Validation loss = 0.01721140928566456
Validation loss = 0.017150428146123886
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017694054171442986
Validation loss = 0.01670544594526291
Validation loss = 0.017235470935702324
Validation loss = 0.016590749844908714
Validation loss = 0.01741190068423748
Validation loss = 0.01652519404888153
Validation loss = 0.017100511118769646
Validation loss = 0.017733793705701828
Validation loss = 0.016387468203902245
Validation loss = 0.016073331236839294
Validation loss = 0.01680067926645279
Validation loss = 0.017746692523360252
Validation loss = 0.01597709022462368
Validation loss = 0.015260215848684311
Validation loss = 0.01638498157262802
Validation loss = 0.016213510185480118
Validation loss = 0.016654781997203827
Validation loss = 0.01719031110405922
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 792
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 790
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 789
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 766
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 778
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 804
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.51e+03 |
| Iteration     | 25       |
| MaximumReturn | 3.88e+03 |
| MinimumReturn | 2.99e+03 |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016474835574626923
Validation loss = 0.01603327877819538
Validation loss = 0.016136163845658302
Validation loss = 0.01632138527929783
Validation loss = 0.015019993297755718
Validation loss = 0.015327530913054943
Validation loss = 0.015181157737970352
Validation loss = 0.015789665281772614
Validation loss = 0.015532661229372025
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017504984512925148
Validation loss = 0.016186712309718132
Validation loss = 0.016471432521939278
Validation loss = 0.016480151563882828
Validation loss = 0.015446011908352375
Validation loss = 0.017238512635231018
Validation loss = 0.015879781916737556
Validation loss = 0.01599724590778351
Validation loss = 0.01584675721824169
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0161272082477808
Validation loss = 0.015598161146044731
Validation loss = 0.015276019461452961
Validation loss = 0.016873881220817566
Validation loss = 0.015187223441898823
Validation loss = 0.015126186423003674
Validation loss = 0.015588493086397648
Validation loss = 0.01644507423043251
Validation loss = 0.015382052399218082
Validation loss = 0.015800466760993004
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01657170243561268
Validation loss = 0.016120919957756996
Validation loss = 0.01679849997162819
Validation loss = 0.016388535499572754
Validation loss = 0.016227608546614647
Validation loss = 0.01614994741976261
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017110874876379967
Validation loss = 0.015724007040262222
Validation loss = 0.015911869704723358
Validation loss = 0.01600487343966961
Validation loss = 0.015273443423211575
Validation loss = 0.016077974811196327
Validation loss = 0.01595543697476387
Validation loss = 0.015605640597641468
Validation loss = 0.0158848874270916
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 809
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 821
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 825
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 818
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 803
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 795
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.25e+03 |
| Iteration     | 26       |
| MaximumReturn | 3.92e+03 |
| MinimumReturn | 2.08e+03 |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015793489292263985
Validation loss = 0.015250477008521557
Validation loss = 0.014728203415870667
Validation loss = 0.014390324242413044
Validation loss = 0.015367996878921986
Validation loss = 0.01507420651614666
Validation loss = 0.01574581302702427
Validation loss = 0.015052853152155876
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01741637848317623
Validation loss = 0.015234651044011116
Validation loss = 0.015735046938061714
Validation loss = 0.01607993058860302
Validation loss = 0.01585078053176403
Validation loss = 0.015608320944011211
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01554448064416647
Validation loss = 0.01537338737398386
Validation loss = 0.01560010202229023
Validation loss = 0.015494142659008503
Validation loss = 0.013833760283887386
Validation loss = 0.014753365889191628
Validation loss = 0.014931525103747845
Validation loss = 0.014143744483590126
Validation loss = 0.016133854165673256
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016278032213449478
Validation loss = 0.015796847641468048
Validation loss = 0.015824681147933006
Validation loss = 0.015520104207098484
Validation loss = 0.015955116599798203
Validation loss = 0.015576175414025784
Validation loss = 0.015595100820064545
Validation loss = 0.01591729000210762
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01590658910572529
Validation loss = 0.01498212106525898
Validation loss = 0.015952248126268387
Validation loss = 0.01564408279955387
Validation loss = 0.015661710873246193
Validation loss = 0.015311653725802898
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 773
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 783
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 855
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 777
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 781
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 787
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.78e+03 |
| Iteration     | 27       |
| MaximumReturn | 4.01e+03 |
| MinimumReturn | 6.14     |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015500763431191444
Validation loss = 0.015263457782566547
Validation loss = 0.014563623815774918
Validation loss = 0.014348991215229034
Validation loss = 0.01599779911339283
Validation loss = 0.01488566491752863
Validation loss = 0.015263596549630165
Validation loss = 0.014432433992624283
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015696363523602486
Validation loss = 0.015207360498607159
Validation loss = 0.015823258087038994
Validation loss = 0.015453111380338669
Validation loss = 0.014925113879144192
Validation loss = 0.014922285452485085
Validation loss = 0.014785460196435452
Validation loss = 0.014923599548637867
Validation loss = 0.016341913491487503
Validation loss = 0.015163527801632881
Validation loss = 0.01432280708104372
Validation loss = 0.01529320701956749
Validation loss = 0.014838097617030144
Validation loss = 0.015371255576610565
Validation loss = 0.01450269017368555
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015199250541627407
Validation loss = 0.014494214206933975
Validation loss = 0.015041866339743137
Validation loss = 0.01414527278393507
Validation loss = 0.015369560569524765
Validation loss = 0.014059794135391712
Validation loss = 0.014171457849442959
Validation loss = 0.014551112428307533
Validation loss = 0.014465750195086002
Validation loss = 0.015295478515326977
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016087515279650688
Validation loss = 0.015659328550100327
Validation loss = 0.014781794510781765
Validation loss = 0.01592896692454815
Validation loss = 0.015206076204776764
Validation loss = 0.015246364288032055
Validation loss = 0.014871843159198761
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01499096304178238
Validation loss = 0.014501956291496754
Validation loss = 0.014850607141852379
Validation loss = 0.014331488870084286
Validation loss = 0.015256394632160664
Validation loss = 0.015021814033389091
Validation loss = 0.015904072672128677
Validation loss = 0.015155852772295475
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 749
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 810
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 803
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 784
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 814
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 769
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.91e+03 |
| Iteration     | 28       |
| MaximumReturn | 4.07e+03 |
| MinimumReturn | 897      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014563354663550854
Validation loss = 0.01472513284534216
Validation loss = 0.014346682466566563
Validation loss = 0.014889013022184372
Validation loss = 0.014563287608325481
Validation loss = 0.014561534859240055
Validation loss = 0.014567299745976925
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01581651158630848
Validation loss = 0.013964171521365643
Validation loss = 0.015175223350524902
Validation loss = 0.014636009931564331
Validation loss = 0.015107286162674427
Validation loss = 0.014904848299920559
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015268838964402676
Validation loss = 0.013852300122380257
Validation loss = 0.014698958955705166
Validation loss = 0.014494672417640686
Validation loss = 0.014405716210603714
Validation loss = 0.013973868452012539
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01491696760058403
Validation loss = 0.01454414613544941
Validation loss = 0.015247528441250324
Validation loss = 0.01485775038599968
Validation loss = 0.015107071958482265
Validation loss = 0.014598598703742027
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015277085825800896
Validation loss = 0.01473234687000513
Validation loss = 0.01529314462095499
Validation loss = 0.014201338402926922
Validation loss = 0.015206288546323776
Validation loss = 0.014378812164068222
Validation loss = 0.014889255166053772
Validation loss = 0.014041357673704624
Validation loss = 0.0143051166087389
Validation loss = 0.014489450491964817
Validation loss = 0.013994764536619186
Validation loss = 0.014334291219711304
Validation loss = 0.013930688612163067
Validation loss = 0.01400597020983696
Validation loss = 0.014338483102619648
Validation loss = 0.014175599440932274
Validation loss = 0.014214844442903996
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 794
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 795
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 808
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 757
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 802
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 757
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.09e+03 |
| Iteration     | 29       |
| MaximumReturn | 4.03e+03 |
| MinimumReturn | 283      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013964304700493813
Validation loss = 0.01433957926928997
Validation loss = 0.014671538956463337
Validation loss = 0.014367283321917057
Validation loss = 0.014261318370699883
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014571038074791431
Validation loss = 0.014110942371189594
Validation loss = 0.014098908752202988
Validation loss = 0.014780228957533836
Validation loss = 0.015301712788641453
Validation loss = 0.01599527709186077
Validation loss = 0.013671859167516232
Validation loss = 0.013858298771083355
Validation loss = 0.013728482648730278
Validation loss = 0.014420386403799057
Validation loss = 0.015019753947854042
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014073343016207218
Validation loss = 0.013748581521213055
Validation loss = 0.013589726760983467
Validation loss = 0.013722246512770653
Validation loss = 0.014227676205337048
Validation loss = 0.013752799481153488
Validation loss = 0.013486988842487335
Validation loss = 0.01373475044965744
Validation loss = 0.013820155523717403
Validation loss = 0.014057539403438568
Validation loss = 0.013394486159086227
Validation loss = 0.013978776521980762
Validation loss = 0.01482451893389225
Validation loss = 0.01369278784841299
Validation loss = 0.014113226905465126
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014906135387718678
Validation loss = 0.014041444286704063
Validation loss = 0.014672084711492062
Validation loss = 0.014748036861419678
Validation loss = 0.013864636421203613
Validation loss = 0.014369581826031208
Validation loss = 0.01465528178960085
Validation loss = 0.014224274083971977
Validation loss = 0.014717165380716324
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015083825215697289
Validation loss = 0.014226460829377174
Validation loss = 0.014045750722289085
Validation loss = 0.014236925169825554
Validation loss = 0.014110803604125977
Validation loss = 0.014526082202792168
Validation loss = 0.014130393974483013
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 723
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 717
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 798
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 756
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 792
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 805
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.89e+03 |
| Iteration     | 30       |
| MaximumReturn | 4.21e+03 |
| MinimumReturn | 599      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014836329035460949
Validation loss = 0.014256957918405533
Validation loss = 0.013363542035222054
Validation loss = 0.014667906798422337
Validation loss = 0.013939090073108673
Validation loss = 0.013873457908630371
Validation loss = 0.013621391728520393
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014569811522960663
Validation loss = 0.014302689582109451
Validation loss = 0.01353280246257782
Validation loss = 0.013920532539486885
Validation loss = 0.014554353430867195
Validation loss = 0.013467391021549702
Validation loss = 0.013617588207125664
Validation loss = 0.014078740030527115
Validation loss = 0.013494091108441353
Validation loss = 0.01323904749006033
Validation loss = 0.014044268056750298
Validation loss = 0.014282713644206524
Validation loss = 0.013830847106873989
Validation loss = 0.013405786827206612
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01399521715939045
Validation loss = 0.013600694015622139
Validation loss = 0.01320370938628912
Validation loss = 0.013590112328529358
Validation loss = 0.013689414598047733
Validation loss = 0.013104582205414772
Validation loss = 0.012921551242470741
Validation loss = 0.013785036280751228
Validation loss = 0.01313997432589531
Validation loss = 0.013536105863749981
Validation loss = 0.012842396274209023
Validation loss = 0.013498259708285332
Validation loss = 0.01362362690269947
Validation loss = 0.013296146877110004
Validation loss = 0.013216070830821991
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015468709170818329
Validation loss = 0.013595987111330032
Validation loss = 0.014238490723073483
Validation loss = 0.014112874865531921
Validation loss = 0.014618913643062115
Validation loss = 0.013993484899401665
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013914661481976509
Validation loss = 0.01371704787015915
Validation loss = 0.01370973140001297
Validation loss = 0.013446259312331676
Validation loss = 0.013586781919002533
Validation loss = 0.01337426621466875
Validation loss = 0.013736223801970482
Validation loss = 0.013085487298667431
Validation loss = 0.013691132888197899
Validation loss = 0.013754412531852722
Validation loss = 0.01320553570985794
Validation loss = 0.014172926545143127
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 770
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 759
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 742
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 789
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 738
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 766
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.23e+03 |
| Iteration     | 31       |
| MaximumReturn | 4.04e+03 |
| MinimumReturn | 401      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014038030058145523
Validation loss = 0.013037251308560371
Validation loss = 0.01419632975012064
Validation loss = 0.013119990937411785
Validation loss = 0.01317042950540781
Validation loss = 0.014123204164206982
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014411971904337406
Validation loss = 0.013812568038702011
Validation loss = 0.013353881426155567
Validation loss = 0.014514961279928684
Validation loss = 0.013534405268728733
Validation loss = 0.013629055581986904
Validation loss = 0.013494505546987057
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013851827010512352
Validation loss = 0.013151508755981922
Validation loss = 0.012933120131492615
Validation loss = 0.013100812211632729
Validation loss = 0.013507730327546597
Validation loss = 0.012566063553094864
Validation loss = 0.012996423058211803
Validation loss = 0.013058406300842762
Validation loss = 0.014108733274042606
Validation loss = 0.01245830301195383
Validation loss = 0.012813002802431583
Validation loss = 0.013158630579710007
Validation loss = 0.012976011261343956
Validation loss = 0.012485332787036896
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015165652148425579
Validation loss = 0.013931455090641975
Validation loss = 0.013853203505277634
Validation loss = 0.013844406232237816
Validation loss = 0.013198360800743103
Validation loss = 0.013349506072700024
Validation loss = 0.014304885640740395
Validation loss = 0.013115651905536652
Validation loss = 0.01328421663492918
Validation loss = 0.013151968829333782
Validation loss = 0.014461270533502102
Validation loss = 0.013150111772119999
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014742348343133926
Validation loss = 0.01319270022213459
Validation loss = 0.013765891082584858
Validation loss = 0.012834856286644936
Validation loss = 0.013036157935857773
Validation loss = 0.013434830121695995
Validation loss = 0.013400702737271786
Validation loss = 0.013518832623958588
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
number of affinization with epsilon = 3 is 805
Path 1 | total_timesteps 1000.
number of affinization with epsilon = 3 is 746
Path 2 | total_timesteps 2000.
number of affinization with epsilon = 3 is 784
Path 3 | total_timesteps 3000.
number of affinization with epsilon = 3 is 803
Path 4 | total_timesteps 4000.
number of affinization with epsilon = 3 is 779
Path 5 | total_timesteps 5000.
number of affinization with epsilon = 3 is 814
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.07e+03 |
| Iteration     | 32       |
| MaximumReturn | 4.02e+03 |
| MinimumReturn | 20.5     |
| TotalSamples  | 136000   |
----------------------------
