Logging to experiments/gym_fswimmer/SA01/Wed-02-Nov-2022-04-24-26-PM-CDT_gym_fswimmer_trpo_iteration_20_seed1231
Print configuration .....
{'env_name': 'gym_fswimmer', 'random_seeds': [2312, 1231, 2631, 5543], 'save_variables': False, 'model_save_dir': '/tmp/gym_fswimmer_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 200, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.31607818603515625
Validation loss = 0.1823684275150299
Validation loss = 0.12870290875434875
Validation loss = 0.11384639889001846
Validation loss = 0.10580325126647949
Validation loss = 0.10437510907649994
Validation loss = 0.09815899282693863
Validation loss = 0.10527452826499939
Validation loss = 0.09116393327713013
Validation loss = 0.09356173872947693
Validation loss = 0.10180433839559555
Validation loss = 0.09165358543395996
Validation loss = 0.10000157356262207
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5126097202301025
Validation loss = 0.18803390860557556
Validation loss = 0.13186335563659668
Validation loss = 0.11672808229923248
Validation loss = 0.11156241595745087
Validation loss = 0.11308950185775757
Validation loss = 0.10698583722114563
Validation loss = 0.10264818370342255
Validation loss = 0.10208412259817123
Validation loss = 0.10037209093570709
Validation loss = 0.09707151353359222
Validation loss = 0.09429866075515747
Validation loss = 0.0976303219795227
Validation loss = 0.09575662016868591
Validation loss = 0.09396466612815857
Validation loss = 0.09683670103549957
Validation loss = 0.10411414504051208
Validation loss = 0.08936291933059692
Validation loss = 0.0914013460278511
Validation loss = 0.08833062648773193
Validation loss = 0.08623562753200531
Validation loss = 0.08270683884620667
Validation loss = 0.09767499566078186
Validation loss = 0.0886424109339714
Validation loss = 0.09303931891918182
Validation loss = 0.08613096177577972
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.520300030708313
Validation loss = 0.1932283490896225
Validation loss = 0.13228824734687805
Validation loss = 0.12106245756149292
Validation loss = 0.11332668364048004
Validation loss = 0.1032351702451706
Validation loss = 0.10491069406270981
Validation loss = 0.09984178841114044
Validation loss = 0.10139790177345276
Validation loss = 0.0959097146987915
Validation loss = 0.10359351336956024
Validation loss = 0.09708830714225769
Validation loss = 0.09309004247188568
Validation loss = 0.09059956669807434
Validation loss = 0.09248000383377075
Validation loss = 0.09328213334083557
Validation loss = 0.09251716732978821
Validation loss = 0.08829452097415924
Validation loss = 0.10902310907840729
Validation loss = 0.08602741360664368
Validation loss = 0.08656921982765198
Validation loss = 0.08528481423854828
Validation loss = 0.08345484733581543
Validation loss = 0.0878865122795105
Validation loss = 0.09688331186771393
Validation loss = 0.0867878794670105
Validation loss = 0.0804993286728859
Validation loss = 0.08131632953882217
Validation loss = 0.09191673994064331
Validation loss = 0.08486078679561615
Validation loss = 0.08049710094928741
Validation loss = 0.08902761340141296
Validation loss = 0.08499889075756073
Validation loss = 0.08682934939861298
Validation loss = 0.08206311613321304
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.37680694460868835
Validation loss = 0.18226408958435059
Validation loss = 0.12670454382896423
Validation loss = 0.11391572654247284
Validation loss = 0.10789155215024948
Validation loss = 0.10151495784521103
Validation loss = 0.10569299012422562
Validation loss = 0.09975196421146393
Validation loss = 0.09937313944101334
Validation loss = 0.09411901980638504
Validation loss = 0.10314673185348511
Validation loss = 0.09900903701782227
Validation loss = 0.1023201122879982
Validation loss = 0.0914861336350441
Validation loss = 0.10408667474985123
Validation loss = 0.09901133924722672
Validation loss = 0.09428732097148895
Validation loss = 0.10425122082233429
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.35905659198760986
Validation loss = 0.17945164442062378
Validation loss = 0.12112106382846832
Validation loss = 0.11531971395015717
Validation loss = 0.11123067140579224
Validation loss = 0.10437338054180145
Validation loss = 0.11009751260280609
Validation loss = 0.10043878108263016
Validation loss = 0.10603071749210358
Validation loss = 0.10789532959461212
Validation loss = 0.10196495056152344
Validation loss = 0.11329863220453262
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 0.836    |
| Iteration     | 0        |
| MaximumReturn | 5.97     |
| MinimumReturn | -6.32    |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1452665627002716
Validation loss = 0.0805540382862091
Validation loss = 0.05745913088321686
Validation loss = 0.05291307345032692
Validation loss = 0.04948246479034424
Validation loss = 0.04497021436691284
Validation loss = 0.04146905615925789
Validation loss = 0.04109419882297516
Validation loss = 0.03947635740041733
Validation loss = 0.04062114655971527
Validation loss = 0.03471626341342926
Validation loss = 0.03656535968184471
Validation loss = 0.04190494865179062
Validation loss = 0.0423816554248333
Validation loss = 0.0383523590862751
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14871986210346222
Validation loss = 0.0709548145532608
Validation loss = 0.052274882793426514
Validation loss = 0.04949656501412392
Validation loss = 0.04606634005904198
Validation loss = 0.04188045114278793
Validation loss = 0.04262881726026535
Validation loss = 0.04537506029009819
Validation loss = 0.03883816674351692
Validation loss = 0.04221382737159729
Validation loss = 0.03750720992684364
Validation loss = 0.03607337176799774
Validation loss = 0.033195096999406815
Validation loss = 0.03447776287794113
Validation loss = 0.035509683191776276
Validation loss = 0.035409264266490936
Validation loss = 0.03750046715140343
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.150446355342865
Validation loss = 0.061634112149477005
Validation loss = 0.050783880054950714
Validation loss = 0.04574131593108177
Validation loss = 0.0413428395986557
Validation loss = 0.037311702966690063
Validation loss = 0.04481355845928192
Validation loss = 0.03835313022136688
Validation loss = 0.03801129013299942
Validation loss = 0.041902270168066025
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.14671190083026886
Validation loss = 0.07252376526594162
Validation loss = 0.055494971573352814
Validation loss = 0.049188338220119476
Validation loss = 0.052745576947927475
Validation loss = 0.04289477691054344
Validation loss = 0.041322849690914154
Validation loss = 0.04435136169195175
Validation loss = 0.04359281808137894
Validation loss = 0.044314853847026825
Validation loss = 0.05413348972797394
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15007629990577698
Validation loss = 0.08968443423509598
Validation loss = 0.06356368213891983
Validation loss = 0.05562582239508629
Validation loss = 0.053347986191511154
Validation loss = 0.046686865389347076
Validation loss = 0.04454825818538666
Validation loss = 0.044532932341098785
Validation loss = 0.04270409047603607
Validation loss = 0.04463302344083786
Validation loss = 0.0402129590511322
Validation loss = 0.03854285180568695
Validation loss = 0.03367041423916817
Validation loss = 0.035103000700473785
Validation loss = 0.03457123413681984
Validation loss = 0.03916576877236366
Validation loss = 0.03273111581802368
Validation loss = 0.03158220276236534
Validation loss = 0.0375041626393795
Validation loss = 0.0319545716047287
Validation loss = 0.03357512131333351
Validation loss = 0.03180728480219841
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.07     |
| Iteration     | 1        |
| MaximumReturn | 10.2     |
| MinimumReturn | -3.38    |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08249524980783463
Validation loss = 0.02931658923625946
Validation loss = 0.026590952649712563
Validation loss = 0.02314980886876583
Validation loss = 0.023376978933811188
Validation loss = 0.02221694402396679
Validation loss = 0.021733157336711884
Validation loss = 0.021382391452789307
Validation loss = 0.023225734010338783
Validation loss = 0.020143818110227585
Validation loss = 0.018280992284417152
Validation loss = 0.019013648852705956
Validation loss = 0.01850154623389244
Validation loss = 0.019396966323256493
Validation loss = 0.02170654945075512
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07721417397260666
Validation loss = 0.029224969446659088
Validation loss = 0.026418069377541542
Validation loss = 0.026497581973671913
Validation loss = 0.02472630701959133
Validation loss = 0.02379266731441021
Validation loss = 0.026658467948436737
Validation loss = 0.02781520038843155
Validation loss = 0.02114264853298664
Validation loss = 0.02011908032000065
Validation loss = 0.019212398678064346
Validation loss = 0.018958309665322304
Validation loss = 0.01949540711939335
Validation loss = 0.01966904290020466
Validation loss = 0.020681999623775482
Validation loss = 0.020243709906935692
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08763609081506729
Validation loss = 0.03212492913007736
Validation loss = 0.028867056593298912
Validation loss = 0.026732610538601875
Validation loss = 0.023805906996130943
Validation loss = 0.023040713742375374
Validation loss = 0.02559233270585537
Validation loss = 0.025664227083325386
Validation loss = 0.023827219381928444
Validation loss = 0.020874248817563057
Validation loss = 0.024327920749783516
Validation loss = 0.020129801705479622
Validation loss = 0.021051442250609398
Validation loss = 0.019403250887989998
Validation loss = 0.021555615589022636
Validation loss = 0.018811525776982307
Validation loss = 0.018336700275540352
Validation loss = 0.019743287935853004
Validation loss = 0.020300624892115593
Validation loss = 0.01772368885576725
Validation loss = 0.017494725063443184
Validation loss = 0.019319063052535057
Validation loss = 0.01751038245856762
Validation loss = 0.019061071798205376
Validation loss = 0.0182031262665987
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.062452152371406555
Validation loss = 0.025773808360099792
Validation loss = 0.02308535762131214
Validation loss = 0.02457500249147415
Validation loss = 0.020523885264992714
Validation loss = 0.02004609815776348
Validation loss = 0.019950436428189278
Validation loss = 0.020268090069293976
Validation loss = 0.018710656091570854
Validation loss = 0.017766572535037994
Validation loss = 0.020376430824398994
Validation loss = 0.018675832077860832
Validation loss = 0.0205390565097332
Validation loss = 0.0202091746032238
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08237642794847488
Validation loss = 0.029677944257855415
Validation loss = 0.02649623714387417
Validation loss = 0.02414008416235447
Validation loss = 0.02299528755247593
Validation loss = 0.025687960907816887
Validation loss = 0.024862103164196014
Validation loss = 0.01911926083266735
Validation loss = 0.023978417739272118
Validation loss = 0.02108236961066723
Validation loss = 0.01824689842760563
Validation loss = 0.01775936782360077
Validation loss = 0.01817297749221325
Validation loss = 0.01751473918557167
Validation loss = 0.019181348383426666
Validation loss = 0.017765002325177193
Validation loss = 0.018892034888267517
Validation loss = 0.017065145075321198
Validation loss = 0.01912657544016838
Validation loss = 0.01941412314772606
Validation loss = 0.0176384299993515
Validation loss = 0.017178857699036598
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.44     |
| Iteration     | 2        |
| MaximumReturn | 7.63     |
| MinimumReturn | -3.39    |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.038348760455846786
Validation loss = 0.022759683430194855
Validation loss = 0.020739849656820297
Validation loss = 0.02181991934776306
Validation loss = 0.01838167943060398
Validation loss = 0.019690748304128647
Validation loss = 0.01669318601489067
Validation loss = 0.018323881551623344
Validation loss = 0.01764783263206482
Validation loss = 0.016543366014957428
Validation loss = 0.0175965316593647
Validation loss = 0.017067570239305496
Validation loss = 0.015268547460436821
Validation loss = 0.020218713209033012
Validation loss = 0.014530814252793789
Validation loss = 0.01836668699979782
Validation loss = 0.01514081098139286
Validation loss = 0.015309755690395832
Validation loss = 0.016498805955052376
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.030259808525443077
Validation loss = 0.02400776371359825
Validation loss = 0.024002477526664734
Validation loss = 0.02076205424964428
Validation loss = 0.021587936207652092
Validation loss = 0.019599691033363342
Validation loss = 0.020412219688296318
Validation loss = 0.019468486309051514
Validation loss = 0.0182453952729702
Validation loss = 0.01842002011835575
Validation loss = 0.019041338935494423
Validation loss = 0.018812503665685654
Validation loss = 0.019338756799697876
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.044899407774209976
Validation loss = 0.024336159229278564
Validation loss = 0.022078633308410645
Validation loss = 0.022911738604307175
Validation loss = 0.022134333848953247
Validation loss = 0.022867536172270775
Validation loss = 0.02083059400320053
Validation loss = 0.018615728244185448
Validation loss = 0.017655890434980392
Validation loss = 0.021054502576589584
Validation loss = 0.017139209434390068
Validation loss = 0.01716194674372673
Validation loss = 0.016231870278716087
Validation loss = 0.01705174520611763
Validation loss = 0.016687989234924316
Validation loss = 0.015143375843763351
Validation loss = 0.016745537519454956
Validation loss = 0.014871487393975258
Validation loss = 0.0189634021371603
Validation loss = 0.01624404639005661
Validation loss = 0.015187318436801434
Validation loss = 0.014621250331401825
Validation loss = 0.01429129671305418
Validation loss = 0.014747303910553455
Validation loss = 0.014148758724331856
Validation loss = 0.01595238223671913
Validation loss = 0.019068008288741112
Validation loss = 0.015438470989465714
Validation loss = 0.01413201354444027
Validation loss = 0.01670386642217636
Validation loss = 0.014183553867042065
Validation loss = 0.017359508201479912
Validation loss = 0.015005361288785934
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03388882428407669
Validation loss = 0.021728213876485825
Validation loss = 0.02105969563126564
Validation loss = 0.021816888824105263
Validation loss = 0.01995832845568657
Validation loss = 0.018412454053759575
Validation loss = 0.020123301073908806
Validation loss = 0.018215056508779526
Validation loss = 0.017127851024270058
Validation loss = 0.01747148483991623
Validation loss = 0.016640419140458107
Validation loss = 0.01913139596581459
Validation loss = 0.019549842923879623
Validation loss = 0.016433507204055786
Validation loss = 0.017391793429851532
Validation loss = 0.015186214819550514
Validation loss = 0.01605498231947422
Validation loss = 0.016510067507624626
Validation loss = 0.015940116718411446
Validation loss = 0.014875474385917187
Validation loss = 0.015068350359797478
Validation loss = 0.01528353150933981
Validation loss = 0.016080524772405624
Validation loss = 0.016454631462693214
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03946782276034355
Validation loss = 0.02334119752049446
Validation loss = 0.027430078014731407
Validation loss = 0.020918944850564003
Validation loss = 0.01986156404018402
Validation loss = 0.01929769665002823
Validation loss = 0.018551314249634743
Validation loss = 0.018335800617933273
Validation loss = 0.01975630782544613
Validation loss = 0.02139156311750412
Validation loss = 0.018114907667040825
Validation loss = 0.016555853188037872
Validation loss = 0.01604490727186203
Validation loss = 0.015935996547341347
Validation loss = 0.015135672874748707
Validation loss = 0.01619117148220539
Validation loss = 0.01608295738697052
Validation loss = 0.015882771462202072
Validation loss = 0.01780564710497856
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 15.5     |
| Iteration     | 3        |
| MaximumReturn | 27.2     |
| MinimumReturn | 4.21     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03133351355791092
Validation loss = 0.01885814405977726
Validation loss = 0.019604232162237167
Validation loss = 0.021147070452570915
Validation loss = 0.020004428923130035
Validation loss = 0.016608724370598793
Validation loss = 0.016865333542227745
Validation loss = 0.015546629205346107
Validation loss = 0.018630236387252808
Validation loss = 0.014722314663231373
Validation loss = 0.016054043546319008
Validation loss = 0.014113359153270721
Validation loss = 0.013554970733821392
Validation loss = 0.013620723970234394
Validation loss = 0.018267039209604263
Validation loss = 0.01452929712831974
Validation loss = 0.01753992959856987
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.022110704332590103
Validation loss = 0.02121933177113533
Validation loss = 0.016678621992468834
Validation loss = 0.01635555550456047
Validation loss = 0.016175758093595505
Validation loss = 0.02275732345879078
Validation loss = 0.015178585425019264
Validation loss = 0.01533353142440319
Validation loss = 0.016663800925016403
Validation loss = 0.016850652173161507
Validation loss = 0.01486265193670988
Validation loss = 0.017236212268471718
Validation loss = 0.020396174862980843
Validation loss = 0.014556601643562317
Validation loss = 0.017523320391774178
Validation loss = 0.015993420034646988
Validation loss = 0.015653278678655624
Validation loss = 0.01778014563024044
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02483019419014454
Validation loss = 0.02327583357691765
Validation loss = 0.017424048855900764
Validation loss = 0.016990162432193756
Validation loss = 0.01979585364460945
Validation loss = 0.01479983888566494
Validation loss = 0.013254182413220406
Validation loss = 0.015024898573756218
Validation loss = 0.014300676062703133
Validation loss = 0.012432285584509373
Validation loss = 0.012939524836838245
Validation loss = 0.01693902537226677
Validation loss = 0.01422478724271059
Validation loss = 0.013378622010350227
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028538871556520462
Validation loss = 0.02146177552640438
Validation loss = 0.020956281572580338
Validation loss = 0.018019791692495346
Validation loss = 0.019457612186670303
Validation loss = 0.018932541832327843
Validation loss = 0.019427750259637833
Validation loss = 0.016249889507889748
Validation loss = 0.021424390375614166
Validation loss = 0.019724147394299507
Validation loss = 0.015054082497954369
Validation loss = 0.015284392051398754
Validation loss = 0.014209495857357979
Validation loss = 0.01609002612531185
Validation loss = 0.015201138332486153
Validation loss = 0.013280456885695457
Validation loss = 0.017221106216311455
Validation loss = 0.017593810334801674
Validation loss = 0.02574487030506134
Validation loss = 0.0145990876480937
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.023339947685599327
Validation loss = 0.019166871905326843
Validation loss = 0.018419092521071434
Validation loss = 0.01619729958474636
Validation loss = 0.01692204922437668
Validation loss = 0.0162314735352993
Validation loss = 0.01470177061855793
Validation loss = 0.014014601707458496
Validation loss = 0.01571168191730976
Validation loss = 0.014464887790381908
Validation loss = 0.014678752049803734
Validation loss = 0.014338269829750061
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 43.1     |
| Iteration     | 4        |
| MaximumReturn | 46.6     |
| MinimumReturn | 36.7     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016729969531297684
Validation loss = 0.01566973328590393
Validation loss = 0.013449110090732574
Validation loss = 0.013164005242288113
Validation loss = 0.012633166275918484
Validation loss = 0.013304486870765686
Validation loss = 0.013392232358455658
Validation loss = 0.012953600846230984
Validation loss = 0.013564810156822205
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019018111750483513
Validation loss = 0.017957467585802078
Validation loss = 0.0143729941919446
Validation loss = 0.013461186550557613
Validation loss = 0.011935790069401264
Validation loss = 0.014275126159191132
Validation loss = 0.014104253612458706
Validation loss = 0.016083942726254463
Validation loss = 0.01598775014281273
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017827900126576424
Validation loss = 0.013030379079282284
Validation loss = 0.012791234068572521
Validation loss = 0.014650236815214157
Validation loss = 0.010903534479439259
Validation loss = 0.011525516398251057
Validation loss = 0.010860123671591282
Validation loss = 0.012869964353740215
Validation loss = 0.01629635877907276
Validation loss = 0.010829675942659378
Validation loss = 0.013578507117927074
Validation loss = 0.01108632329851389
Validation loss = 0.015203957445919514
Validation loss = 0.012983818538486958
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015663908794522285
Validation loss = 0.014289447106420994
Validation loss = 0.014670479111373425
Validation loss = 0.016247380524873734
Validation loss = 0.020739158615469933
Validation loss = 0.017089569941163063
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013721375726163387
Validation loss = 0.015199790708720684
Validation loss = 0.01601819135248661
Validation loss = 0.013555090874433517
Validation loss = 0.011332106776535511
Validation loss = 0.012175790965557098
Validation loss = 0.01265786960721016
Validation loss = 0.011369269341230392
Validation loss = 0.012774626724421978
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 37.9     |
| Iteration     | 5        |
| MaximumReturn | 40.2     |
| MinimumReturn | 34.7     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018927518278360367
Validation loss = 0.014692467637360096
Validation loss = 0.01403597928583622
Validation loss = 0.012086397968232632
Validation loss = 0.011694768443703651
Validation loss = 0.016223065555095673
Validation loss = 0.018023356795310974
Validation loss = 0.011776997707784176
Validation loss = 0.011050415225327015
Validation loss = 0.011798898689448833
Validation loss = 0.011875743977725506
Validation loss = 0.010980737395584583
Validation loss = 0.012851489707827568
Validation loss = 0.012002403847873211
Validation loss = 0.013458982110023499
Validation loss = 0.01320095919072628
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015642890706658363
Validation loss = 0.014059903100132942
Validation loss = 0.012171219103038311
Validation loss = 0.011890539899468422
Validation loss = 0.013570108450949192
Validation loss = 0.0110920500010252
Validation loss = 0.013159619644284248
Validation loss = 0.011476884596049786
Validation loss = 0.012226455844938755
Validation loss = 0.011652035638689995
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031024416908621788
Validation loss = 0.01778920739889145
Validation loss = 0.010140781290829182
Validation loss = 0.011761459521949291
Validation loss = 0.013050825335085392
Validation loss = 0.012529760599136353
Validation loss = 0.010806010104715824
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015370051376521587
Validation loss = 0.011822427622973919
Validation loss = 0.011260787956416607
Validation loss = 0.009871401824057102
Validation loss = 0.013774300925433636
Validation loss = 0.015242685563862324
Validation loss = 0.018735989928245544
Validation loss = 0.012901616282761097
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015241598710417747
Validation loss = 0.012339639477431774
Validation loss = 0.011574432253837585
Validation loss = 0.011582320556044579
Validation loss = 0.011353293433785439
Validation loss = 0.010082335211336613
Validation loss = 0.012888694182038307
Validation loss = 0.012834609486162663
Validation loss = 0.011157144792377949
Validation loss = 0.010970826260745525
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 103      |
| Iteration     | 6        |
| MaximumReturn | 106      |
| MinimumReturn | 98.6     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.016900235787034035
Validation loss = 0.011648954823613167
Validation loss = 0.010555165819823742
Validation loss = 0.015310708433389664
Validation loss = 0.012555137276649475
Validation loss = 0.010976247489452362
Validation loss = 0.011633594520390034
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015559466555714607
Validation loss = 0.011354013346135616
Validation loss = 0.011463980190455914
Validation loss = 0.018955793231725693
Validation loss = 0.011077101342380047
Validation loss = 0.011721143499016762
Validation loss = 0.011931892484426498
Validation loss = 0.009984837844967842
Validation loss = 0.011327646672725677
Validation loss = 0.010422881692647934
Validation loss = 0.010651795193552971
Validation loss = 0.011626717634499073
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01207703072577715
Validation loss = 0.010320956818759441
Validation loss = 0.011098182760179043
Validation loss = 0.012198982760310173
Validation loss = 0.009298766031861305
Validation loss = 0.011915775015950203
Validation loss = 0.011384684592485428
Validation loss = 0.014744531363248825
Validation loss = 0.009673488326370716
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015504520386457443
Validation loss = 0.013358758762478828
Validation loss = 0.010721110738813877
Validation loss = 0.016270224004983902
Validation loss = 0.016446564346551895
Validation loss = 0.010364714078605175
Validation loss = 0.012789596803486347
Validation loss = 0.01133313775062561
Validation loss = 0.015285598114132881
Validation loss = 0.014994166791439056
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015671653673052788
Validation loss = 0.011988049373030663
Validation loss = 0.008664200082421303
Validation loss = 0.011120854876935482
Validation loss = 0.008849535137414932
Validation loss = 0.009657732211053371
Validation loss = 0.011877394281327724
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 192      |
| Iteration     | 7        |
| MaximumReturn | 201      |
| MinimumReturn | 182      |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012540772557258606
Validation loss = 0.009776069782674313
Validation loss = 0.012010198086500168
Validation loss = 0.007051645778119564
Validation loss = 0.01236350554972887
Validation loss = 0.01538875326514244
Validation loss = 0.009003247134387493
Validation loss = 0.00845744926482439
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009835418313741684
Validation loss = 0.009881836362183094
Validation loss = 0.01138346828520298
Validation loss = 0.008130934089422226
Validation loss = 0.012218749150633812
Validation loss = 0.00930364616215229
Validation loss = 0.008643340319395065
Validation loss = 0.010216789320111275
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011290513910353184
Validation loss = 0.010403127409517765
Validation loss = 0.008105991408228874
Validation loss = 0.00719432532787323
Validation loss = 0.008659753948450089
Validation loss = 0.008431168273091316
Validation loss = 0.00949740968644619
Validation loss = 0.010621901601552963
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00960278045386076
Validation loss = 0.010737528093159199
Validation loss = 0.011927351355552673
Validation loss = 0.014164181426167488
Validation loss = 0.013400111347436905
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010000206530094147
Validation loss = 0.007749214768409729
Validation loss = 0.010294971987605095
Validation loss = 0.008823753334581852
Validation loss = 0.008148100227117538
Validation loss = 0.008306738920509815
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 238      |
| Iteration     | 8        |
| MaximumReturn | 243      |
| MinimumReturn | 233      |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008125506341457367
Validation loss = 0.008255713619291782
Validation loss = 0.009726326912641525
Validation loss = 0.008263918571174145
Validation loss = 0.007253163494169712
Validation loss = 0.010095935314893723
Validation loss = 0.008197106420993805
Validation loss = 0.010330723598599434
Validation loss = 0.011143947951495647
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007715051062405109
Validation loss = 0.00818580575287342
Validation loss = 0.010716932825744152
Validation loss = 0.012957403436303139
Validation loss = 0.008483259938657284
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010652266442775726
Validation loss = 0.007062528282403946
Validation loss = 0.00756202032789588
Validation loss = 0.006856649182736874
Validation loss = 0.007827553898096085
Validation loss = 0.008150828070938587
Validation loss = 0.0067214397713541985
Validation loss = 0.0094378050416708
Validation loss = 0.006570931524038315
Validation loss = 0.007953459396958351
Validation loss = 0.008201071061193943
Validation loss = 0.008921700529754162
Validation loss = 0.005940825678408146
Validation loss = 0.007788144983351231
Validation loss = 0.00637729000300169
Validation loss = 0.007237971760332584
Validation loss = 0.006651092320680618
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013012481853365898
Validation loss = 0.00915417168289423
Validation loss = 0.008821960538625717
Validation loss = 0.007998043671250343
Validation loss = 0.010166330263018608
Validation loss = 0.010369056835770607
Validation loss = 0.008291021920740604
Validation loss = 0.007072843611240387
Validation loss = 0.012200595811009407
Validation loss = 0.010947301052510738
Validation loss = 0.007331006228923798
Validation loss = 0.010812494903802872
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007078259252011776
Validation loss = 0.008611609227955341
Validation loss = 0.007666954305022955
Validation loss = 0.0076879882253706455
Validation loss = 0.010468989610671997
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 270      |
| Iteration     | 9        |
| MaximumReturn | 275      |
| MinimumReturn | 268      |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012553786858916283
Validation loss = 0.006553381681442261
Validation loss = 0.009507465176284313
Validation loss = 0.007992454804480076
Validation loss = 0.005416888277977705
Validation loss = 0.0076781450770795345
Validation loss = 0.006069534923881292
Validation loss = 0.0056678373366594315
Validation loss = 0.0049721309915184975
Validation loss = 0.009811528958380222
Validation loss = 0.007053622975945473
Validation loss = 0.0070008751936256886
Validation loss = 0.009936361573636532
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009042631834745407
Validation loss = 0.007639979012310505
Validation loss = 0.005882775876671076
Validation loss = 0.006757168099284172
Validation loss = 0.008871179074048996
Validation loss = 0.0072745634242892265
Validation loss = 0.009537219069898129
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006464093923568726
Validation loss = 0.007093310821801424
Validation loss = 0.008938313461840153
Validation loss = 0.007699441630393267
Validation loss = 0.008555494248867035
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008664238266646862
Validation loss = 0.006900510285049677
Validation loss = 0.007770458236336708
Validation loss = 0.008208025246858597
Validation loss = 0.006832567043602467
Validation loss = 0.007458391599357128
Validation loss = 0.013556009158492088
Validation loss = 0.010994854383170605
Validation loss = 0.0063996086828410625
Validation loss = 0.007386710029095411
Validation loss = 0.011176487430930138
Validation loss = 0.009795165620744228
Validation loss = 0.006195181980729103
Validation loss = 0.005991314072161913
Validation loss = 0.006940017454326153
Validation loss = 0.007438462693244219
Validation loss = 0.007283926010131836
Validation loss = 0.007620878051966429
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006847858894616365
Validation loss = 0.006374836433678865
Validation loss = 0.006771163549274206
Validation loss = 0.007306424900889397
Validation loss = 0.008512570522725582
Validation loss = 0.005830197129398584
Validation loss = 0.006607868243008852
Validation loss = 0.0060184672474861145
Validation loss = 0.007379746064543724
Validation loss = 0.005825170781463385
Validation loss = 0.006317253224551678
Validation loss = 0.007506712805479765
Validation loss = 0.00585470674559474
Validation loss = 0.006509978789836168
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 274      |
| Iteration     | 10       |
| MaximumReturn | 279      |
| MinimumReturn | 270      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005620569456368685
Validation loss = 0.007021244615316391
Validation loss = 0.006387657951563597
Validation loss = 0.007276406977325678
Validation loss = 0.009559649042785168
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0073707918636500835
Validation loss = 0.004986691288650036
Validation loss = 0.0065741692669689655
Validation loss = 0.006634922232478857
Validation loss = 0.004923300351947546
Validation loss = 0.007318384945392609
Validation loss = 0.007651417050510645
Validation loss = 0.010813361965119839
Validation loss = 0.008328594267368317
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004993766080588102
Validation loss = 0.005624087527394295
Validation loss = 0.005172013770788908
Validation loss = 0.006106536835432053
Validation loss = 0.005599528551101685
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006037936080247164
Validation loss = 0.006626175716519356
Validation loss = 0.005058201029896736
Validation loss = 0.005045939236879349
Validation loss = 0.0076518673449754715
Validation loss = 0.008647261187434196
Validation loss = 0.006646578665822744
Validation loss = 0.006009302567690611
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005473518278449774
Validation loss = 0.0047123022377491
Validation loss = 0.0047876653261482716
Validation loss = 0.005548906046897173
Validation loss = 0.005911919753998518
Validation loss = 0.005904668942093849
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 291      |
| Iteration     | 11       |
| MaximumReturn | 294      |
| MinimumReturn | 285      |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00514229666441679
Validation loss = 0.004390674643218517
Validation loss = 0.005868775304406881
Validation loss = 0.006254717241972685
Validation loss = 0.0050324914045631886
Validation loss = 0.005447802599519491
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005421533714979887
Validation loss = 0.00542545598000288
Validation loss = 0.006687535904347897
Validation loss = 0.00690584909170866
Validation loss = 0.0075661116279661655
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010147737339138985
Validation loss = 0.005456036888062954
Validation loss = 0.004625953733921051
Validation loss = 0.006551727652549744
Validation loss = 0.006253289990127087
Validation loss = 0.005644096061587334
Validation loss = 0.006906024646013975
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006279659457504749
Validation loss = 0.006078632548451424
Validation loss = 0.007617882918566465
Validation loss = 0.007026058156043291
Validation loss = 0.006259640213102102
Validation loss = 0.00653423648327589
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005538277328014374
Validation loss = 0.0052080946043133736
Validation loss = 0.004488886334002018
Validation loss = 0.004637130536139011
Validation loss = 0.0050767892971634865
Validation loss = 0.004860346205532551
Validation loss = 0.006338566076010466
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 289      |
| Iteration     | 12       |
| MaximumReturn | 296      |
| MinimumReturn | 284      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006096562836319208
Validation loss = 0.004253844264894724
Validation loss = 0.005065653007477522
Validation loss = 0.00528698367998004
Validation loss = 0.0036398600786924362
Validation loss = 0.004859884735196829
Validation loss = 0.004735266324132681
Validation loss = 0.00547654926776886
Validation loss = 0.006504379212856293
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00663423677906394
Validation loss = 0.005744036287069321
Validation loss = 0.006367180962115526
Validation loss = 0.009226848371326923
Validation loss = 0.005146101117134094
Validation loss = 0.004894413519650698
Validation loss = 0.005275636445730925
Validation loss = 0.0059135244227945805
Validation loss = 0.007424086797982454
Validation loss = 0.005989848170429468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004386612679809332
Validation loss = 0.004504177253693342
Validation loss = 0.006624318193644285
Validation loss = 0.00959622673690319
Validation loss = 0.003791018621996045
Validation loss = 0.004795337561517954
Validation loss = 0.004428505431860685
Validation loss = 0.005117554217576981
Validation loss = 0.0062658474780619144
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007347814738750458
Validation loss = 0.006435452960431576
Validation loss = 0.004019780550152063
Validation loss = 0.005064093973487616
Validation loss = 0.004310405347496271
Validation loss = 0.0059631699696183205
Validation loss = 0.006595594342797995
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006293354090303183
Validation loss = 0.004496875684708357
Validation loss = 0.005133099388331175
Validation loss = 0.005489817820489407
Validation loss = 0.005879102740436792
Validation loss = 0.00470685912296176
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 291      |
| Iteration     | 13       |
| MaximumReturn | 295      |
| MinimumReturn | 287      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007510824128985405
Validation loss = 0.004777485970407724
Validation loss = 0.0041959937661886215
Validation loss = 0.0047820317558944225
Validation loss = 0.004675001371651888
Validation loss = 0.004818426910787821
Validation loss = 0.0035488128196448088
Validation loss = 0.005836777854710817
Validation loss = 0.004717370495200157
Validation loss = 0.005805768072605133
Validation loss = 0.004092492628842592
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005641582887619734
Validation loss = 0.004308286122977734
Validation loss = 0.007128232158720493
Validation loss = 0.0037463984917849302
Validation loss = 0.005750809796154499
Validation loss = 0.00484205037355423
Validation loss = 0.007760336622595787
Validation loss = 0.009186415001749992
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004569907207041979
Validation loss = 0.004432448651641607
Validation loss = 0.0037676538340747356
Validation loss = 0.00395707692950964
Validation loss = 0.0044212969951331615
Validation loss = 0.005745225120335817
Validation loss = 0.004213353618979454
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004764019511640072
Validation loss = 0.004495425149798393
Validation loss = 0.004051686264574528
Validation loss = 0.005926869343966246
Validation loss = 0.003981865476816893
Validation loss = 0.004420467186719179
Validation loss = 0.00641256058588624
Validation loss = 0.004336888901889324
Validation loss = 0.004915033001452684
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005732730962336063
Validation loss = 0.004004943650215864
Validation loss = 0.003955024294555187
Validation loss = 0.004645021632313728
Validation loss = 0.004719472490251064
Validation loss = 0.0037918484304100275
Validation loss = 0.004147127736359835
Validation loss = 0.005539669655263424
Validation loss = 0.003700138535350561
Validation loss = 0.005989149212837219
Validation loss = 0.004473149310797453
Validation loss = 0.0045143719762563705
Validation loss = 0.0057891053147614
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 304      |
| Iteration     | 14       |
| MaximumReturn | 309      |
| MinimumReturn | 300      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004823651164770126
Validation loss = 0.005677612964063883
Validation loss = 0.0038188411854207516
Validation loss = 0.004369059111922979
Validation loss = 0.0037863359320908785
Validation loss = 0.0038939225487411022
Validation loss = 0.005516625475138426
Validation loss = 0.0038199694827198982
Validation loss = 0.004534550476819277
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00640249764546752
Validation loss = 0.005230317823588848
Validation loss = 0.004115752410143614
Validation loss = 0.005361533258110285
Validation loss = 0.0051678926683962345
Validation loss = 0.004496240057051182
Validation loss = 0.005762588232755661
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003715954488143325
Validation loss = 0.004911331459879875
Validation loss = 0.004160100594162941
Validation loss = 0.003135459730401635
Validation loss = 0.00467795180156827
Validation loss = 0.003404399147257209
Validation loss = 0.0033764999825507402
Validation loss = 0.0036834015045315027
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007172839716076851
Validation loss = 0.0046904911287128925
Validation loss = 0.004401104524731636
Validation loss = 0.005877906922250986
Validation loss = 0.006272365804761648
Validation loss = 0.00550572806969285
Validation loss = 0.006822322495281696
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004040245432406664
Validation loss = 0.0035338178277015686
Validation loss = 0.004416089504957199
Validation loss = 0.003885293612256646
Validation loss = 0.004279323387891054
Validation loss = 0.004236029926687479
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 295      |
| Iteration     | 15       |
| MaximumReturn | 300      |
| MinimumReturn | 290      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0048670656979084015
Validation loss = 0.003769015660509467
Validation loss = 0.0033782734535634518
Validation loss = 0.0032321210019290447
Validation loss = 0.003959949593991041
Validation loss = 0.003046942874789238
Validation loss = 0.004456095397472382
Validation loss = 0.0032787031959742308
Validation loss = 0.004201573319733143
Validation loss = 0.0034039171878248453
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004306324757635593
Validation loss = 0.005676348693668842
Validation loss = 0.005186223424971104
Validation loss = 0.005883017089217901
Validation loss = 0.0070121753960847855
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004307109862565994
Validation loss = 0.003536706557497382
Validation loss = 0.0038682743906974792
Validation loss = 0.003951006568968296
Validation loss = 0.0034087239764630795
Validation loss = 0.003605416975915432
Validation loss = 0.004007007461041212
Validation loss = 0.003186417743563652
Validation loss = 0.004767155274748802
Validation loss = 0.0035879667848348618
Validation loss = 0.0031465073116123676
Validation loss = 0.003234206000342965
Validation loss = 0.003290849504992366
Validation loss = 0.0037945553194731474
Validation loss = 0.003290723543614149
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005151425953954458
Validation loss = 0.003242463804781437
Validation loss = 0.004849377553910017
Validation loss = 0.006537458393722773
Validation loss = 0.006660291459411383
Validation loss = 0.004320843610912561
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003438943298533559
Validation loss = 0.003849498461931944
Validation loss = 0.0035396083258092403
Validation loss = 0.003781200386583805
Validation loss = 0.004182098899036646
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 306      |
| Iteration     | 16       |
| MaximumReturn | 311      |
| MinimumReturn | 302      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0035446477122604847
Validation loss = 0.002980544464662671
Validation loss = 0.0030894975643604994
Validation loss = 0.0029363136272877455
Validation loss = 0.003399172332137823
Validation loss = 0.002519825706258416
Validation loss = 0.002904128050431609
Validation loss = 0.0030499836429953575
Validation loss = 0.003482208587229252
Validation loss = 0.0034714972134679556
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005825856234878302
Validation loss = 0.006020314525812864
Validation loss = 0.005181053653359413
Validation loss = 0.0030691337306052446
Validation loss = 0.005989784840494394
Validation loss = 0.005358016584068537
Validation loss = 0.0033591368701308966
Validation loss = 0.0029946111608296633
Validation loss = 0.003445794340223074
Validation loss = 0.004129115026444197
Validation loss = 0.005021001677960157
Validation loss = 0.004071144387125969
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0029330977704375982
Validation loss = 0.003621523967012763
Validation loss = 0.002832167549058795
Validation loss = 0.0028065198566764593
Validation loss = 0.003675113432109356
Validation loss = 0.0032875221222639084
Validation loss = 0.0033667604438960552
Validation loss = 0.0031840382143855095
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003865031758323312
Validation loss = 0.007351644337177277
Validation loss = 0.0031689773313701153
Validation loss = 0.003916721325367689
Validation loss = 0.004515951033681631
Validation loss = 0.005185266956686974
Validation loss = 0.003275894094258547
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0028540166094899178
Validation loss = 0.004497744143009186
Validation loss = 0.003446252318099141
Validation loss = 0.0030816947109997272
Validation loss = 0.0045210616663098335
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 310      |
| Iteration     | 17       |
| MaximumReturn | 316      |
| MinimumReturn | 300      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0031837974674999714
Validation loss = 0.0034806535113602877
Validation loss = 0.0030852947384119034
Validation loss = 0.002620505401864648
Validation loss = 0.00348380277864635
Validation loss = 0.0026448057033121586
Validation loss = 0.003273785812780261
Validation loss = 0.0028008827939629555
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003997810184955597
Validation loss = 0.004727227613329887
Validation loss = 0.0034876347053796053
Validation loss = 0.0061751012690365314
Validation loss = 0.003106628777459264
Validation loss = 0.0050524515099823475
Validation loss = 0.0038878079503774643
Validation loss = 0.004578933119773865
Validation loss = 0.0047684027813375
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0029053266625851393
Validation loss = 0.003119221655651927
Validation loss = 0.0028498375322669744
Validation loss = 0.003952604252845049
Validation loss = 0.0030894882511347532
Validation loss = 0.0027394250500947237
Validation loss = 0.002933303127065301
Validation loss = 0.002706115832552314
Validation loss = 0.003832543268799782
Validation loss = 0.0029398936312645674
Validation loss = 0.0030395991634577513
Validation loss = 0.0031292394269257784
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003646452911198139
Validation loss = 0.004351774230599403
Validation loss = 0.005981744267046452
Validation loss = 0.00838546734303236
Validation loss = 0.0038005325477570295
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0031097440514713526
Validation loss = 0.004201369360089302
Validation loss = 0.003881378099322319
Validation loss = 0.0035196731332689524
Validation loss = 0.003899436676874757
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 311      |
| Iteration     | 18       |
| MaximumReturn | 314      |
| MinimumReturn | 305      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002974061295390129
Validation loss = 0.0032107557635754347
Validation loss = 0.0029513402841985226
Validation loss = 0.003122155088931322
Validation loss = 0.0033977017737925053
Validation loss = 0.002781496848911047
Validation loss = 0.002960486803203821
Validation loss = 0.0028421308379620314
Validation loss = 0.0027288594283163548
Validation loss = 0.0029867845587432384
Validation loss = 0.0029603303410112858
Validation loss = 0.002475106855854392
Validation loss = 0.0035367540549486876
Validation loss = 0.003581361146643758
Validation loss = 0.0027420860715210438
Validation loss = 0.00317126652225852
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003048580139875412
Validation loss = 0.003648021724075079
Validation loss = 0.002854755148291588
Validation loss = 0.00406430009752512
Validation loss = 0.0045522479340434074
Validation loss = 0.003730050753802061
Validation loss = 0.0036063313018530607
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002585969166830182
Validation loss = 0.0029120862018316984
Validation loss = 0.003041743068024516
Validation loss = 0.003406304167583585
Validation loss = 0.0027830072212964296
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0054839118383824825
Validation loss = 0.0036387424916028976
Validation loss = 0.006564506329596043
Validation loss = 0.00900161825120449
Validation loss = 0.004312695004045963
Validation loss = 0.003911944571882486
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0031145450193434954
Validation loss = 0.005688081495463848
Validation loss = 0.004118151962757111
Validation loss = 0.0036601510364562273
Validation loss = 0.003496747463941574
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 314      |
| Iteration     | 19       |
| MaximumReturn | 317      |
| MinimumReturn | 310      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0029548448510468006
Validation loss = 0.002307443181052804
Validation loss = 0.0025871379766613245
Validation loss = 0.00271794106811285
Validation loss = 0.002391384681686759
Validation loss = 0.0024752612225711346
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003855489892885089
Validation loss = 0.0032852303702384233
Validation loss = 0.002447148086503148
Validation loss = 0.0028367117047309875
Validation loss = 0.0027892354410141706
Validation loss = 0.006765368394553661
Validation loss = 0.0038850356359034777
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0032699289731681347
Validation loss = 0.0023738869931548834
Validation loss = 0.002340087667107582
Validation loss = 0.002509644953534007
Validation loss = 0.002951753558591008
Validation loss = 0.0026454534381628036
Validation loss = 0.0028497453313320875
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004314166959375143
Validation loss = 0.004419589415192604
Validation loss = 0.004719150718301535
Validation loss = 0.00501807639375329
Validation loss = 0.002898812759667635
Validation loss = 0.0052208732813596725
Validation loss = 0.004282825160771608
Validation loss = 0.0036336034536361694
Validation loss = 0.006767335347831249
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003199060680344701
Validation loss = 0.0031596252229064703
Validation loss = 0.002874735277146101
Validation loss = 0.0029312155675143003
Validation loss = 0.0029810015112161636
Validation loss = 0.003226929111406207
Validation loss = 0.0035324532072991133
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 302      |
| Iteration     | 20       |
| MaximumReturn | 309      |
| MinimumReturn | 296      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003072151681408286
Validation loss = 0.002834690734744072
Validation loss = 0.0025631238240748644
Validation loss = 0.0021925235632807016
Validation loss = 0.0023694622796028852
Validation loss = 0.0023286035284399986
Validation loss = 0.0025272632483392954
Validation loss = 0.004369337577372789
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003407374955713749
Validation loss = 0.002429297426715493
Validation loss = 0.0033827691804617643
Validation loss = 0.0032458335626870394
Validation loss = 0.002933272160589695
Validation loss = 0.002670153509825468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024186717346310616
Validation loss = 0.0026308083906769753
Validation loss = 0.0027975395787507296
Validation loss = 0.0025245221331715584
Validation loss = 0.0024891160428524017
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0030375905334949493
Validation loss = 0.004837591201066971
Validation loss = 0.003749625524505973
Validation loss = 0.005506608169525862
Validation loss = 0.002622713800519705
Validation loss = 0.0034255958162248135
Validation loss = 0.0032555244397372007
Validation loss = 0.005547011736780405
Validation loss = 0.0051211765967309475
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025693972129374743
Validation loss = 0.0030765568371862173
Validation loss = 0.004578857682645321
Validation loss = 0.0031398290302604437
Validation loss = 0.00329402438364923
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 317      |
| Iteration     | 21       |
| MaximumReturn | 322      |
| MinimumReturn | 310      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00265107792802155
Validation loss = 0.002926385495811701
Validation loss = 0.002238926710560918
Validation loss = 0.0025007594376802444
Validation loss = 0.0025943047367036343
Validation loss = 0.002146007027477026
Validation loss = 0.002469011815264821
Validation loss = 0.002393765840679407
Validation loss = 0.0023567036259919405
Validation loss = 0.002067667432129383
Validation loss = 0.0034597779158502817
Validation loss = 0.002357067074626684
Validation loss = 0.0021178415045142174
Validation loss = 0.0022431763354688883
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003229335881769657
Validation loss = 0.00429557403549552
Validation loss = 0.0033871193882077932
Validation loss = 0.0030088170897215605
Validation loss = 0.002947860397398472
Validation loss = 0.0024970457889139652
Validation loss = 0.00304721063002944
Validation loss = 0.002900569699704647
Validation loss = 0.0028375079855322838
Validation loss = 0.00466359406709671
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024186105001717806
Validation loss = 0.0023591313511133194
Validation loss = 0.002932598814368248
Validation loss = 0.0023952857591211796
Validation loss = 0.002119365381076932
Validation loss = 0.0024757441133260727
Validation loss = 0.002635121112689376
Validation loss = 0.00240865140222013
Validation loss = 0.0022375185508280993
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005077662877738476
Validation loss = 0.00273566204123199
Validation loss = 0.007912317290902138
Validation loss = 0.005004731472581625
Validation loss = 0.0056849936954677105
Validation loss = 0.007084247190505266
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0026725544594228268
Validation loss = 0.0031281227711588144
Validation loss = 0.002889597322791815
Validation loss = 0.002589396433904767
Validation loss = 0.003007459919899702
Validation loss = 0.0028719992842525244
Validation loss = 0.002777536166831851
Validation loss = 0.0030601888429373503
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 327      |
| Iteration     | 22       |
| MaximumReturn | 329      |
| MinimumReturn | 323      |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002048751339316368
Validation loss = 0.002276928164064884
Validation loss = 0.002441092161461711
Validation loss = 0.003695494495332241
Validation loss = 0.0025020968168973923
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004861010704189539
Validation loss = 0.003981696907430887
Validation loss = 0.003679086687043309
Validation loss = 0.0024690809659659863
Validation loss = 0.0032337268348783255
Validation loss = 0.0026726217474788427
Validation loss = 0.004508878570050001
Validation loss = 0.0024341519456356764
Validation loss = 0.004272720776498318
Validation loss = 0.002729227766394615
Validation loss = 0.003491979092359543
Validation loss = 0.002872518030926585
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00214503169991076
Validation loss = 0.0025089357513934374
Validation loss = 0.0021179988980293274
Validation loss = 0.002269818913191557
Validation loss = 0.0021674891468137503
Validation loss = 0.003203999251127243
Validation loss = 0.002127953339368105
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005643825512379408
Validation loss = 0.00470438739284873
Validation loss = 0.0038169173058122396
Validation loss = 0.003444280242547393
Validation loss = 0.004994786810129881
Validation loss = 0.003383071394637227
Validation loss = 0.0037667809519916773
Validation loss = 0.002498582238331437
Validation loss = 0.0037176001351326704
Validation loss = 0.004948776215314865
Validation loss = 0.003992936573922634
Validation loss = 0.0039535085670650005
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002910185605287552
Validation loss = 0.0034205587580800056
Validation loss = 0.002836560597643256
Validation loss = 0.002792180282995105
Validation loss = 0.0027416471857577562
Validation loss = 0.004641028121113777
Validation loss = 0.0033218066673725843
Validation loss = 0.003539967117831111
Validation loss = 0.0027701507788151503
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 326      |
| Iteration     | 23       |
| MaximumReturn | 330      |
| MinimumReturn | 321      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001992677105590701
Validation loss = 0.0020473143085837364
Validation loss = 0.003070049686357379
Validation loss = 0.0027639884501695633
Validation loss = 0.002081451937556267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0025626514106988907
Validation loss = 0.004539111629128456
Validation loss = 0.0021248096600174904
Validation loss = 0.004295616410672665
Validation loss = 0.002269044751301408
Validation loss = 0.0022901215124875307
Validation loss = 0.0025965457316488028
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002893497934564948
Validation loss = 0.0022071702405810356
Validation loss = 0.002486265730112791
Validation loss = 0.002555190585553646
Validation loss = 0.0021473742090165615
Validation loss = 0.0019466152880340815
Validation loss = 0.00217503122985363
Validation loss = 0.0022122012451291084
Validation loss = 0.002199509646743536
Validation loss = 0.0019827759824693203
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003405957715585828
Validation loss = 0.0032649345230311155
Validation loss = 0.0033120031002908945
Validation loss = 0.0038247453048825264
Validation loss = 0.002508952049538493
Validation loss = 0.002677056472748518
Validation loss = 0.0029205894097685814
Validation loss = 0.002551100216805935
Validation loss = 0.005093419924378395
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0028839875012636185
Validation loss = 0.0032125129364430904
Validation loss = 0.0029667112976312637
Validation loss = 0.0031072485726326704
Validation loss = 0.002791437553241849
Validation loss = 0.0026680510491132736
Validation loss = 0.002933662151917815
Validation loss = 0.003112553618848324
Validation loss = 0.0031220680102705956
Validation loss = 0.002747743623331189
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 309      |
| Iteration     | 24       |
| MaximumReturn | 315      |
| MinimumReturn | 303      |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0019770769868046045
Validation loss = 0.002193055348470807
Validation loss = 0.002101417165249586
Validation loss = 0.002123807091265917
Validation loss = 0.0020861118100583553
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004089928697794676
Validation loss = 0.0024399368558079004
Validation loss = 0.002235914347693324
Validation loss = 0.0024437347892671824
Validation loss = 0.0026634277310222387
Validation loss = 0.002400558441877365
Validation loss = 0.0020403857342898846
Validation loss = 0.0021728870924562216
Validation loss = 0.003042571246623993
Validation loss = 0.0023386043030768633
Validation loss = 0.0028488251846283674
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022711842320859432
Validation loss = 0.002022902946919203
Validation loss = 0.0024493620730936527
Validation loss = 0.0021226105745881796
Validation loss = 0.0020927288569509983
Validation loss = 0.001968549797311425
Validation loss = 0.002194582251831889
Validation loss = 0.002011415548622608
Validation loss = 0.0023033542092889547
Validation loss = 0.0019814057741314173
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0028419126756489277
Validation loss = 0.0029518851079046726
Validation loss = 0.002514471299946308
Validation loss = 0.003350405255332589
Validation loss = 0.004017575643956661
Validation loss = 0.003155121812596917
Validation loss = 0.0028976912144571543
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002788122743368149
Validation loss = 0.0024028094485402107
Validation loss = 0.0023497426882386208
Validation loss = 0.0024787860456854105
Validation loss = 0.0028289793990552425
Validation loss = 0.002393398666754365
Validation loss = 0.0028118740301579237
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 295      |
| Iteration     | 25       |
| MaximumReturn | 302      |
| MinimumReturn | 291      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002446817234158516
Validation loss = 0.0021255123429000378
Validation loss = 0.0023341039195656776
Validation loss = 0.0022620665840804577
Validation loss = 0.002358995610848069
Validation loss = 0.0018341548275202513
Validation loss = 0.0020935696084052324
Validation loss = 0.002060650149360299
Validation loss = 0.0024637558963149786
Validation loss = 0.0019504703814163804
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002689763903617859
Validation loss = 0.0031279611866921186
Validation loss = 0.0023887082934379578
Validation loss = 0.0019782946910709143
Validation loss = 0.002613923978060484
Validation loss = 0.002788765821605921
Validation loss = 0.0021558795124292374
Validation loss = 0.0030916850082576275
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002405674895271659
Validation loss = 0.002167011611163616
Validation loss = 0.0024858226533979177
Validation loss = 0.0018330714665353298
Validation loss = 0.0019756632391363382
Validation loss = 0.0023780674673616886
Validation loss = 0.001808686414733529
Validation loss = 0.0019333456875756383
Validation loss = 0.002104053972288966
Validation loss = 0.0019412836991250515
Validation loss = 0.0018578959861770272
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0028893938288092613
Validation loss = 0.0038228670600801706
Validation loss = 0.002601507818326354
Validation loss = 0.0031020601745694876
Validation loss = 0.003876869333907962
Validation loss = 0.0027817103546112776
Validation loss = 0.00264115072786808
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002405403880402446
Validation loss = 0.002850953722372651
Validation loss = 0.0027167361695319414
Validation loss = 0.002363077364861965
Validation loss = 0.0028448300436139107
Validation loss = 0.0032374621368944645
Validation loss = 0.002580651780590415
Validation loss = 0.002205899218097329
Validation loss = 0.0022316162940114737
Validation loss = 0.002266925061121583
Validation loss = 0.00224282406270504
Validation loss = 0.002703279023990035
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 295      |
| Iteration     | 26       |
| MaximumReturn | 300      |
| MinimumReturn | 287      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018841313431039453
Validation loss = 0.002150965388864279
Validation loss = 0.002416693838313222
Validation loss = 0.0022452992852777243
Validation loss = 0.0018528478685766459
Validation loss = 0.0018509970977902412
Validation loss = 0.0023324117064476013
Validation loss = 0.0021185039076954126
Validation loss = 0.001877998118288815
Validation loss = 0.002062007552012801
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002205782802775502
Validation loss = 0.0030250283889472485
Validation loss = 0.002311170566827059
Validation loss = 0.002120938617736101
Validation loss = 0.002091387752443552
Validation loss = 0.0031727321911603212
Validation loss = 0.0020902606192976236
Validation loss = 0.002892762888222933
Validation loss = 0.003017094451934099
Validation loss = 0.0020607791375368834
Validation loss = 0.001978204119950533
Validation loss = 0.0024739208165556192
Validation loss = 0.002128065098077059
Validation loss = 0.0031489937100559473
Validation loss = 0.0018156018340960145
Validation loss = 0.0019237444503232837
Validation loss = 0.0019502455834299326
Validation loss = 0.00189380650408566
Validation loss = 0.0020750700496137142
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018245538230985403
Validation loss = 0.0023705665953457355
Validation loss = 0.0020025453995913267
Validation loss = 0.002471334533765912
Validation loss = 0.001985762733966112
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0025719788391143084
Validation loss = 0.004263061564415693
Validation loss = 0.00489859888330102
Validation loss = 0.0026952705811709166
Validation loss = 0.003168380120769143
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002979883924126625
Validation loss = 0.0023167356848716736
Validation loss = 0.002556473482400179
Validation loss = 0.0030883625149726868
Validation loss = 0.002266902709379792
Validation loss = 0.002421649405732751
Validation loss = 0.00263959844596684
Validation loss = 0.0022611902095377445
Validation loss = 0.002288068877533078
Validation loss = 0.002419999334961176
Validation loss = 0.002249785466119647
Validation loss = 0.002516710665076971
Validation loss = 0.0021356758661568165
Validation loss = 0.0025879384484142065
Validation loss = 0.0024266389664262533
Validation loss = 0.002198960864916444
Validation loss = 0.0025663573760539293
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 310      |
| Iteration     | 27       |
| MaximumReturn | 313      |
| MinimumReturn | 307      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0022173335310071707
Validation loss = 0.002493114909157157
Validation loss = 0.0023419116623699665
Validation loss = 0.0022721767891198397
Validation loss = 0.002036338672041893
Validation loss = 0.00213232496753335
Validation loss = 0.0018105559283867478
Validation loss = 0.0025071760173887014
Validation loss = 0.0018117630388587713
Validation loss = 0.0017785486998036504
Validation loss = 0.002436509355902672
Validation loss = 0.002142133191227913
Validation loss = 0.0019354983232915401
Validation loss = 0.0018462823936715722
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003272188361734152
Validation loss = 0.001672356971539557
Validation loss = 0.0024875530507415533
Validation loss = 0.0018225376261398196
Validation loss = 0.0020965023431926966
Validation loss = 0.0017386135878041387
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019628789741545916
Validation loss = 0.002111331559717655
Validation loss = 0.0017985282465815544
Validation loss = 0.0019856721628457308
Validation loss = 0.001989488024264574
Validation loss = 0.0019935276359319687
Validation loss = 0.0018433794612064958
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0024906049948185682
Validation loss = 0.002975320443511009
Validation loss = 0.005414566956460476
Validation loss = 0.003679234068840742
Validation loss = 0.002844566013664007
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023674501571804285
Validation loss = 0.002166603459045291
Validation loss = 0.0021313850302249193
Validation loss = 0.0024022331926971674
Validation loss = 0.0021744025871157646
Validation loss = 0.0019795603584498167
Validation loss = 0.002835538936778903
Validation loss = 0.0023009125143289566
Validation loss = 0.0030740553047508
Validation loss = 0.002413548994809389
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 330      |
| Iteration     | 28       |
| MaximumReturn | 335      |
| MinimumReturn | 324      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0019781242590397596
Validation loss = 0.0019424769561737776
Validation loss = 0.0017952446360141039
Validation loss = 0.0024987095966935158
Validation loss = 0.002487232442945242
Validation loss = 0.0019549746066331863
Validation loss = 0.002004907699301839
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018953095423057675
Validation loss = 0.0017917189979925752
Validation loss = 0.002419161144644022
Validation loss = 0.0019562330562621355
Validation loss = 0.0020314303692430258
Validation loss = 0.0017243098700419068
Validation loss = 0.0018315768102183938
Validation loss = 0.0019084258237853646
Validation loss = 0.0019554675091058016
Validation loss = 0.0020108716562390327
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017522192792966962
Validation loss = 0.001905810902826488
Validation loss = 0.002269577933475375
Validation loss = 0.0019018641905859113
Validation loss = 0.0020586184691637754
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003310886910185218
Validation loss = 0.0027309542056173086
Validation loss = 0.002103050472214818
Validation loss = 0.003074859967455268
Validation loss = 0.003028600011020899
Validation loss = 0.002731286222115159
Validation loss = 0.0033179738093167543
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0022330426145344973
Validation loss = 0.0024093673564493656
Validation loss = 0.002177145332098007
Validation loss = 0.001939937355928123
Validation loss = 0.0022840178571641445
Validation loss = 0.0020499632228165865
Validation loss = 0.002065788023173809
Validation loss = 0.0026314847636967897
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 316      |
| Iteration     | 29       |
| MaximumReturn | 318      |
| MinimumReturn | 314      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018493767129257321
Validation loss = 0.0019595990888774395
Validation loss = 0.0019084378145635128
Validation loss = 0.002109799301251769
Validation loss = 0.0018276277696713805
Validation loss = 0.0019438735907897353
Validation loss = 0.0018335708882659674
Validation loss = 0.0017520022811368108
Validation loss = 0.0021380437538027763
Validation loss = 0.0018312857719138265
Validation loss = 0.002155034337192774
Validation loss = 0.0021970218513160944
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002084457315504551
Validation loss = 0.001964414957910776
Validation loss = 0.0016515678726136684
Validation loss = 0.0016894573345780373
Validation loss = 0.0016155263874679804
Validation loss = 0.0016012957785278559
Validation loss = 0.0020345966331660748
Validation loss = 0.0020973121281713247
Validation loss = 0.0016085925744846463
Validation loss = 0.0021111739333719015
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001971899066120386
Validation loss = 0.0020530130714178085
Validation loss = 0.001928646001033485
Validation loss = 0.002054679673165083
Validation loss = 0.0018907063640654087
Validation loss = 0.0021605875808745623
Validation loss = 0.0019552551675587893
Validation loss = 0.001852367422543466
Validation loss = 0.0017068242887035012
Validation loss = 0.0016600643284618855
Validation loss = 0.0033347064163535833
Validation loss = 0.0017290899995714426
Validation loss = 0.0018707213457673788
Validation loss = 0.001698299776762724
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0036143800243735313
Validation loss = 0.002336816629394889
Validation loss = 0.0026443558745086193
Validation loss = 0.0021223293151706457
Validation loss = 0.0023933574557304382
Validation loss = 0.0026211945805698633
Validation loss = 0.0037235887721180916
Validation loss = 0.0017750500701367855
Validation loss = 0.0020050364546477795
Validation loss = 0.0031346858013421297
Validation loss = 0.00305401929654181
Validation loss = 0.004824155941605568
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002160763368010521
Validation loss = 0.0021153243724256754
Validation loss = 0.0017338556936010718
Validation loss = 0.002080249832943082
Validation loss = 0.0027159317396581173
Validation loss = 0.0027555404230952263
Validation loss = 0.002022471046075225
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 325      |
| Iteration     | 30       |
| MaximumReturn | 329      |
| MinimumReturn | 321      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002113517839461565
Validation loss = 0.0018603336066007614
Validation loss = 0.0017736908048391342
Validation loss = 0.0021161986514925957
Validation loss = 0.001950928010046482
Validation loss = 0.0018546711653470993
Validation loss = 0.001735742436721921
Validation loss = 0.0017362944781780243
Validation loss = 0.001893837470561266
Validation loss = 0.00194987200666219
Validation loss = 0.0021230215206742287
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002006037626415491
Validation loss = 0.0018582018092274666
Validation loss = 0.0017865797271952033
Validation loss = 0.0019042304484173656
Validation loss = 0.001974675338715315
Validation loss = 0.0016403060872107744
Validation loss = 0.0017216012347489595
Validation loss = 0.001883904216811061
Validation loss = 0.0018997462466359138
Validation loss = 0.001791065325960517
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017802193760871887
Validation loss = 0.0016857258742675185
Validation loss = 0.0016274164663627744
Validation loss = 0.0016085871029645205
Validation loss = 0.0016845663776621222
Validation loss = 0.0019174955086782575
Validation loss = 0.001784399151802063
Validation loss = 0.0018250193679705262
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0033731432631611824
Validation loss = 0.0028589123394340277
Validation loss = 0.0031063933856785297
Validation loss = 0.0030912277288734913
Validation loss = 0.0033604989293962717
Validation loss = 0.0028925202786922455
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020906804129481316
Validation loss = 0.003490590024739504
Validation loss = 0.002105009276419878
Validation loss = 0.0021497514098882675
Validation loss = 0.0019132960587739944
Validation loss = 0.0029854632448405027
Validation loss = 0.002644143532961607
Validation loss = 0.0021194289438426495
Validation loss = 0.002459446433931589
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 299      |
| Iteration     | 31       |
| MaximumReturn | 307      |
| MinimumReturn | 293      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018037797417491674
Validation loss = 0.0021052320953458548
Validation loss = 0.0017014577751979232
Validation loss = 0.0018441808642819524
Validation loss = 0.0018316339701414108
Validation loss = 0.001614986453205347
Validation loss = 0.001899397699162364
Validation loss = 0.0020073987543582916
Validation loss = 0.00196144194342196
Validation loss = 0.0020497485529631376
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018130685202777386
Validation loss = 0.0019148326246067882
Validation loss = 0.001978156389668584
Validation loss = 0.0016087263356894255
Validation loss = 0.0015766262076795101
Validation loss = 0.0017356049502268434
Validation loss = 0.002676493488252163
Validation loss = 0.0016603268450126052
Validation loss = 0.0015923831379041076
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021129741799086332
Validation loss = 0.0017787459073588252
Validation loss = 0.0019369057845324278
Validation loss = 0.0015684050740674138
Validation loss = 0.001972993602976203
Validation loss = 0.00199050921946764
Validation loss = 0.0016182162798941135
Validation loss = 0.0023151137866079807
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002237891312688589
Validation loss = 0.00196005217730999
Validation loss = 0.0027472139336168766
Validation loss = 0.0020044685807079077
Validation loss = 0.0025650644674897194
Validation loss = 0.004434669390320778
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018204515799880028
Validation loss = 0.002402300713583827
Validation loss = 0.0021524890325963497
Validation loss = 0.002662369515746832
Validation loss = 0.0018427741015329957
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 324      |
| Iteration     | 32       |
| MaximumReturn | 327      |
| MinimumReturn | 320      |
| TotalSamples  | 136000   |
----------------------------
