Logging to experiments/invertedPendulum/test-exp-dir/test-exp_seed2431
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7377730011940002
Validation loss = 0.3952641189098358
Validation loss = 0.36883994936943054
Validation loss = 0.3266167640686035
Validation loss = 0.325074166059494
Validation loss = 0.3032171428203583
Validation loss = 0.282615065574646
Validation loss = 0.2671283185482025
Validation loss = 0.25260302424430847
Validation loss = 0.25589051842689514
Validation loss = 0.23929031193256378
Validation loss = 0.22305701673030853
Validation loss = 0.21646754443645477
Validation loss = 0.2121548056602478
Validation loss = 0.19893799722194672
Validation loss = 0.1964840292930603
Validation loss = 0.19502106308937073
Validation loss = 0.19176918268203735
Validation loss = 0.2014624923467636
Validation loss = 0.18181072175502777
Validation loss = 0.1730954349040985
Validation loss = 0.1628572940826416
Validation loss = 0.16057056188583374
Validation loss = 0.15063321590423584
Validation loss = 0.1401136964559555
Validation loss = 0.13995422422885895
Validation loss = 0.16099469363689423
Validation loss = 0.1330432891845703
Validation loss = 0.11949679255485535
Validation loss = 0.11574503034353256
Validation loss = 0.11365298926830292
Validation loss = 0.11977124959230423
Validation loss = 0.11526961624622345
Validation loss = 0.10749781876802444
Validation loss = 0.10380320250988007
Validation loss = 0.09613575041294098
Validation loss = 0.09930339455604553
Validation loss = 0.10500802099704742
Validation loss = 0.08793020993471146
Validation loss = 0.08009917289018631
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.755893349647522
Validation loss = 0.3616832196712494
Validation loss = 0.38581976294517517
Validation loss = 0.339708149433136
Validation loss = 0.33011141419410706
Validation loss = 0.30029672384262085
Validation loss = 0.2870163917541504
Validation loss = 0.2685008645057678
Validation loss = 0.25872278213500977
Validation loss = 0.23732469975948334
Validation loss = 0.24241866171360016
Validation loss = 0.22499269247055054
Validation loss = 0.2137647271156311
Validation loss = 0.2196836769580841
Validation loss = 0.21163441240787506
Validation loss = 0.19415909051895142
Validation loss = 0.20271143317222595
Validation loss = 0.2037782073020935
Validation loss = 0.20031647384166718
Validation loss = 0.16911908984184265
Validation loss = 0.1628272831439972
Validation loss = 0.17282375693321228
Validation loss = 0.15693232417106628
Validation loss = 0.15036453306674957
Validation loss = 0.1585576981306076
Validation loss = 0.13921916484832764
Validation loss = 0.13580825924873352
Validation loss = 0.14576536417007446
Validation loss = 0.12190698832273483
Validation loss = 0.10874565690755844
Validation loss = 0.1130065992474556
Validation loss = 0.10572461783885956
Validation loss = 0.11500218510627747
Validation loss = 0.11034438014030457
Validation loss = 0.1166725605726242
Validation loss = 0.10248713195323944
Validation loss = 0.09823643416166306
Validation loss = 0.08871304988861084
Validation loss = 0.1076502799987793
Validation loss = 0.11265994608402252
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7442603707313538
Validation loss = 0.3971763551235199
Validation loss = 0.3540797829627991
Validation loss = 0.3301320970058441
Validation loss = 0.32222992181777954
Validation loss = 0.30664026737213135
Validation loss = 0.275156170129776
Validation loss = 0.26518356800079346
Validation loss = 0.2522470951080322
Validation loss = 0.2527262270450592
Validation loss = 0.23238565027713776
Validation loss = 0.2275550365447998
Validation loss = 0.23493826389312744
Validation loss = 0.2059803456068039
Validation loss = 0.20194989442825317
Validation loss = 0.203186497092247
Validation loss = 0.19505929946899414
Validation loss = 0.18333116173744202
Validation loss = 0.19501414895057678
Validation loss = 0.1554800570011139
Validation loss = 0.17658790946006775
Validation loss = 0.17388589680194855
Validation loss = 0.18585528433322906
Validation loss = 0.15201085805892944
Validation loss = 0.13789208233356476
Validation loss = 0.13812769949436188
Validation loss = 0.1344725787639618
Validation loss = 0.12376748025417328
Validation loss = 0.1322105973958969
Validation loss = 0.11687637120485306
Validation loss = 0.10743400454521179
Validation loss = 0.11630944907665253
Validation loss = 0.1118568703532219
Validation loss = 0.09695736318826675
Validation loss = 0.09978356212377548
Validation loss = 0.09432274103164673
Validation loss = 0.08629783242940903
Validation loss = 0.08719166368246078
Validation loss = 0.08696754276752472
Validation loss = 0.07712235301733017
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.758876383304596
Validation loss = 0.42352426052093506
Validation loss = 0.3651231825351715
Validation loss = 0.33204686641693115
Validation loss = 0.33007436990737915
Validation loss = 0.31174808740615845
Validation loss = 0.2997205853462219
Validation loss = 0.2788260281085968
Validation loss = 0.26348257064819336
Validation loss = 0.2582043409347534
Validation loss = 0.23733706772327423
Validation loss = 0.23941412568092346
Validation loss = 0.21967844665050507
Validation loss = 0.2078605443239212
Validation loss = 0.2119743824005127
Validation loss = 0.19868989288806915
Validation loss = 0.1883314996957779
Validation loss = 0.1842326670885086
Validation loss = 0.18091823160648346
Validation loss = 0.16888554394245148
Validation loss = 0.16047582030296326
Validation loss = 0.16457504034042358
Validation loss = 0.14690054953098297
Validation loss = 0.16112467646598816
Validation loss = 0.13781997561454773
Validation loss = 0.12537911534309387
Validation loss = 0.12462302297353745
Validation loss = 0.12720826268196106
Validation loss = 0.11755350232124329
Validation loss = 0.11452390253543854
Validation loss = 0.11036280542612076
Validation loss = 0.09158985316753387
Validation loss = 0.13271744549274445
Validation loss = 0.10178479552268982
Validation loss = 0.09987664967775345
Validation loss = 0.10782033205032349
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7437059283256531
Validation loss = 0.4566284418106079
Validation loss = 0.40863457322120667
Validation loss = 0.3532484173774719
Validation loss = 0.3403185307979584
Validation loss = 0.3261542320251465
Validation loss = 0.32435643672943115
Validation loss = 0.2966427505016327
Validation loss = 0.29136061668395996
Validation loss = 0.262360155582428
Validation loss = 0.25160762667655945
Validation loss = 0.2426888793706894
Validation loss = 0.237209290266037
Validation loss = 0.2277924120426178
Validation loss = 0.22368435561656952
Validation loss = 0.2107437402009964
Validation loss = 0.20300069451332092
Validation loss = 0.19048011302947998
Validation loss = 0.18908698856830597
Validation loss = 0.1864231824874878
Validation loss = 0.1627761870622635
Validation loss = 0.14864026010036469
Validation loss = 0.15582586824893951
Validation loss = 0.1512848138809204
Validation loss = 0.13235025107860565
Validation loss = 0.12518525123596191
Validation loss = 0.13813462853431702
Validation loss = 0.13866114616394043
Validation loss = 0.12716001272201538
Validation loss = 0.11241072416305542
Validation loss = 0.11565741151571274
Validation loss = 0.11381936073303223
Validation loss = 0.0983959287405014
Validation loss = 0.09449200332164764
Validation loss = 0.12180083990097046
Validation loss = 0.09896516799926758
Validation loss = 0.10021375119686127
Validation loss = 0.10134383291006088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.66    |
| Iteration     | 0        |
| MaximumReturn | -0.0209  |
| MinimumReturn | -10.8    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3561037480831146
Validation loss = 0.16830569505691528
Validation loss = 0.14745153486728668
Validation loss = 0.1390189230442047
Validation loss = 0.12751077115535736
Validation loss = 0.11716765910387039
Validation loss = 0.11014341562986374
Validation loss = 0.09390682727098465
Validation loss = 0.09130546450614929
Validation loss = 0.07846667617559433
Validation loss = 0.06492273509502411
Validation loss = 0.09111400693655014
Validation loss = 0.06288867443799973
Validation loss = 0.05454622581601143
Validation loss = 0.05383772403001785
Validation loss = 0.049143578857183456
Validation loss = 0.04991810768842697
Validation loss = 0.04665463790297508
Validation loss = 0.04438978061079979
Validation loss = 0.043365973979234695
Validation loss = 0.03872739151120186
Validation loss = 0.04227210953831673
Validation loss = 0.042369019240140915
Validation loss = 0.03963431715965271
Validation loss = 0.04230080172419548
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.36269518733024597
Validation loss = 0.167856827378273
Validation loss = 0.15678361058235168
Validation loss = 0.14475317299365997
Validation loss = 0.13557745516300201
Validation loss = 0.12140222638845444
Validation loss = 0.10950133949518204
Validation loss = 0.0923590436577797
Validation loss = 0.09140576422214508
Validation loss = 0.07932549715042114
Validation loss = 0.07393710315227509
Validation loss = 0.06501951068639755
Validation loss = 0.06310007721185684
Validation loss = 0.06482461094856262
Validation loss = 0.05916338786482811
Validation loss = 0.056181538850069046
Validation loss = 0.05516079440712929
Validation loss = 0.05801137909293175
Validation loss = 0.04794527590274811
Validation loss = 0.04316995292901993
Validation loss = 0.05146139860153198
Validation loss = 0.0446171909570694
Validation loss = 0.04583045840263367
Validation loss = 0.05796559900045395
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.40428537130355835
Validation loss = 0.17509311437606812
Validation loss = 0.16187432408332825
Validation loss = 0.14953574538230896
Validation loss = 0.13665078580379486
Validation loss = 0.12383552640676498
Validation loss = 0.12156305462121964
Validation loss = 0.10811526328325272
Validation loss = 0.10213693231344223
Validation loss = 0.09014096111059189
Validation loss = 0.07777781784534454
Validation loss = 0.08295302093029022
Validation loss = 0.07175036519765854
Validation loss = 0.06176992878317833
Validation loss = 0.06649469584226608
Validation loss = 0.05330798774957657
Validation loss = 0.058907121419906616
Validation loss = 0.0549173429608345
Validation loss = 0.06003284826874733
Validation loss = 0.05940310284495354
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3761681020259857
Validation loss = 0.1727968007326126
Validation loss = 0.15847615897655487
Validation loss = 0.15149328112602234
Validation loss = 0.142830491065979
Validation loss = 0.13238173723220825
Validation loss = 0.11449350416660309
Validation loss = 0.10387960076332092
Validation loss = 0.08819310367107391
Validation loss = 0.0777224525809288
Validation loss = 0.08145557343959808
Validation loss = 0.0799233466386795
Validation loss = 0.06297223269939423
Validation loss = 0.06364019960165024
Validation loss = 0.055664271116256714
Validation loss = 0.055591363459825516
Validation loss = 0.04783934727311134
Validation loss = 0.06098988652229309
Validation loss = 0.052672967314720154
Validation loss = 0.046779245138168335
Validation loss = 0.04954179748892784
Validation loss = 0.043851010501384735
Validation loss = 0.0463896207511425
Validation loss = 0.04563387483358383
Validation loss = 0.03960787504911423
Validation loss = 0.046229925006628036
Validation loss = 0.0438886396586895
Validation loss = 0.03928673267364502
Validation loss = 0.044415317475795746
Validation loss = 0.05259387567639351
Validation loss = 0.038289666175842285
Validation loss = 0.03842092677950859
Validation loss = 0.037226352840662
Validation loss = 0.03759124502539635
Validation loss = 0.035312481224536896
Validation loss = 0.03180735930800438
Validation loss = 0.033125411719083786
Validation loss = 0.031918030232191086
Validation loss = 0.03553318232297897
Validation loss = 0.033268749713897705
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.337449312210083
Validation loss = 0.17083609104156494
Validation loss = 0.15131942927837372
Validation loss = 0.1412162333726883
Validation loss = 0.12703487277030945
Validation loss = 0.11565252393484116
Validation loss = 0.10835802555084229
Validation loss = 0.09589754790067673
Validation loss = 0.09604015201330185
Validation loss = 0.08286815881729126
Validation loss = 0.07738267630338669
Validation loss = 0.07635670900344849
Validation loss = 0.06449604034423828
Validation loss = 0.06666646152734756
Validation loss = 0.056685321033000946
Validation loss = 0.058337245136499405
Validation loss = 0.05096055939793587
Validation loss = 0.05185778811573982
Validation loss = 0.05266674980521202
Validation loss = 0.04435762017965317
Validation loss = 0.06382924318313599
Validation loss = 0.04724588245153427
Validation loss = 0.05156337097287178
Validation loss = 0.04646402969956398
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.31    |
| Iteration     | 1        |
| MaximumReturn | -0.0511  |
| MinimumReturn | -29.4    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15821978449821472
Validation loss = 0.08359334617853165
Validation loss = 0.061515867710113525
Validation loss = 0.05571259185671806
Validation loss = 0.046717531979084015
Validation loss = 0.04732593521475792
Validation loss = 0.04142817482352257
Validation loss = 0.036452196538448334
Validation loss = 0.02989139035344124
Validation loss = 0.028322765603661537
Validation loss = 0.031095273792743683
Validation loss = 0.0272959154099226
Validation loss = 0.023576591163873672
Validation loss = 0.026568233966827393
Validation loss = 0.025998277589678764
Validation loss = 0.021710356697440147
Validation loss = 0.022851677611470222
Validation loss = 0.02066759765148163
Validation loss = 0.02339668571949005
Validation loss = 0.02678097039461136
Validation loss = 0.020224088802933693
Validation loss = 0.019134582951664925
Validation loss = 0.018220655620098114
Validation loss = 0.019281817600131035
Validation loss = 0.020879950374364853
Validation loss = 0.01713632419705391
Validation loss = 0.0268904697149992
Validation loss = 0.02179451659321785
Validation loss = 0.019150076434016228
Validation loss = 0.01846929080784321
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17841924726963043
Validation loss = 0.08556942641735077
Validation loss = 0.06643644720315933
Validation loss = 0.05422457307577133
Validation loss = 0.04877839237451553
Validation loss = 0.04979168251156807
Validation loss = 0.04201812669634819
Validation loss = 0.03613538667559624
Validation loss = 0.032236434519290924
Validation loss = 0.03108726628124714
Validation loss = 0.03513075411319733
Validation loss = 0.031510017812252045
Validation loss = 0.03781018778681755
Validation loss = 0.025409957394003868
Validation loss = 0.023137608543038368
Validation loss = 0.023358836770057678
Validation loss = 0.023405537009239197
Validation loss = 0.02569526620209217
Validation loss = 0.02347341552376747
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14666497707366943
Validation loss = 0.07941246777772903
Validation loss = 0.060469258576631546
Validation loss = 0.05494658276438713
Validation loss = 0.052491724491119385
Validation loss = 0.04584169387817383
Validation loss = 0.04207708686590195
Validation loss = 0.03280513733625412
Validation loss = 0.03571660444140434
Validation loss = 0.03302110731601715
Validation loss = 0.034698665142059326
Validation loss = 0.028020059689879417
Validation loss = 0.03190136328339577
Validation loss = 0.026182323694229126
Validation loss = 0.026193317025899887
Validation loss = 0.024212874472141266
Validation loss = 0.02935505472123623
Validation loss = 0.02640431560575962
Validation loss = 0.021660324186086655
Validation loss = 0.029230620712041855
Validation loss = 0.028225500136613846
Validation loss = 0.023623362183570862
Validation loss = 0.021484140306711197
Validation loss = 0.018398452550172806
Validation loss = 0.021469995379447937
Validation loss = 0.023635460063815117
Validation loss = 0.022893190383911133
Validation loss = 0.019997315481305122
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.16924835741519928
Validation loss = 0.084663026034832
Validation loss = 0.05972843989729881
Validation loss = 0.0504094623029232
Validation loss = 0.03904544934630394
Validation loss = 0.03432120755314827
Validation loss = 0.03396274894475937
Validation loss = 0.03513726964592934
Validation loss = 0.024506743997335434
Validation loss = 0.028660185635089874
Validation loss = 0.02052762173116207
Validation loss = 0.021837932989001274
Validation loss = 0.02187228389084339
Validation loss = 0.023178795352578163
Validation loss = 0.02526489645242691
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14488162100315094
Validation loss = 0.07846955209970474
Validation loss = 0.06615450978279114
Validation loss = 0.052469320595264435
Validation loss = 0.045400653034448624
Validation loss = 0.04690461605787277
Validation loss = 0.03933196887373924
Validation loss = 0.03937545791268349
Validation loss = 0.034966424107551575
Validation loss = 0.03211937099695206
Validation loss = 0.030960477888584137
Validation loss = 0.03985685110092163
Validation loss = 0.027060654014348984
Validation loss = 0.02642717957496643
Validation loss = 0.023753846064209938
Validation loss = 0.022649507969617844
Validation loss = 0.02656812220811844
Validation loss = 0.023873837664723396
Validation loss = 0.02865004353225231
Validation loss = 0.028274547308683395
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0119  |
| Iteration     | 2        |
| MaximumReturn | -0.00853 |
| MinimumReturn | -0.0172  |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.056179534643888474
Validation loss = 0.028505729511380196
Validation loss = 0.018851103261113167
Validation loss = 0.019500209018588066
Validation loss = 0.021369213238358498
Validation loss = 0.017259592190384865
Validation loss = 0.017133625224232674
Validation loss = 0.018670162186026573
Validation loss = 0.01933121122419834
Validation loss = 0.01342454832047224
Validation loss = 0.018144577741622925
Validation loss = 0.02752966247498989
Validation loss = 0.01630975492298603
Validation loss = 0.02102472446858883
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0544520802795887
Validation loss = 0.04426748678088188
Validation loss = 0.030076155439019203
Validation loss = 0.027038300409913063
Validation loss = 0.024560578167438507
Validation loss = 0.02364227920770645
Validation loss = 0.028980346396565437
Validation loss = 0.027029549703001976
Validation loss = 0.026494821533560753
Validation loss = 0.022531569004058838
Validation loss = 0.023837007582187653
Validation loss = 0.02413792349398136
Validation loss = 0.018937399610877037
Validation loss = 0.020403824746608734
Validation loss = 0.01845281757414341
Validation loss = 0.02370317280292511
Validation loss = 0.018993951380252838
Validation loss = 0.02147345244884491
Validation loss = 0.024325400590896606
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08037233352661133
Validation loss = 0.03293158859014511
Validation loss = 0.031474243849515915
Validation loss = 0.02088828943669796
Validation loss = 0.025016264989972115
Validation loss = 0.020295172929763794
Validation loss = 0.028707891702651978
Validation loss = 0.017493948340415955
Validation loss = 0.01693759858608246
Validation loss = 0.016797272488474846
Validation loss = 0.01816725730895996
Validation loss = 0.017843004316091537
Validation loss = 0.01769263856112957
Validation loss = 0.01899953931570053
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06755225360393524
Validation loss = 0.04539823904633522
Validation loss = 0.03574514761567116
Validation loss = 0.027568472549319267
Validation loss = 0.02413427084684372
Validation loss = 0.022716976702213287
Validation loss = 0.019472023472189903
Validation loss = 0.023348122835159302
Validation loss = 0.020108601078391075
Validation loss = 0.020863546058535576
Validation loss = 0.025925524532794952
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0655934289097786
Validation loss = 0.045948516577482224
Validation loss = 0.036179255694150925
Validation loss = 0.04784185811877251
Validation loss = 0.03742797300219536
Validation loss = 0.03287513181567192
Validation loss = 0.0326569490134716
Validation loss = 0.03019716590642929
Validation loss = 0.02885279804468155
Validation loss = 0.02847050316631794
Validation loss = 0.029467063024640083
Validation loss = 0.027071135118603706
Validation loss = 0.02214275859296322
Validation loss = 0.024333247914910316
Validation loss = 0.021183481439948082
Validation loss = 0.019099712371826172
Validation loss = 0.022488882765173912
Validation loss = 0.018033701926469803
Validation loss = 0.029642006382346153
Validation loss = 0.017023462802171707
Validation loss = 0.022178441286087036
Validation loss = 0.018541017547249794
Validation loss = 0.02030426263809204
Validation loss = 0.02502184547483921
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0113  |
| Iteration     | 3        |
| MaximumReturn | -0.00823 |
| MinimumReturn | -0.0145  |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021435240283608437
Validation loss = 0.014394321478903294
Validation loss = 0.017875541001558304
Validation loss = 0.016748614609241486
Validation loss = 0.017541468143463135
Validation loss = 0.01848679408431053
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.044558100402355194
Validation loss = 0.01746046356856823
Validation loss = 0.019182562828063965
Validation loss = 0.01719622313976288
Validation loss = 0.019643893465399742
Validation loss = 0.016422925516963005
Validation loss = 0.01483976561576128
Validation loss = 0.015331950038671494
Validation loss = 0.01591959223151207
Validation loss = 0.016720769926905632
Validation loss = 0.01196835096925497
Validation loss = 0.015184776857495308
Validation loss = 0.012954357080161572
Validation loss = 0.013690450228750706
Validation loss = 0.015795277431607246
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05152544006705284
Validation loss = 0.022115712985396385
Validation loss = 0.018794411793351173
Validation loss = 0.012876075692474842
Validation loss = 0.013845939189195633
Validation loss = 0.012001294642686844
Validation loss = 0.0153048662468791
Validation loss = 0.016506526619195938
Validation loss = 0.01427010353654623
Validation loss = 0.012801224365830421
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.048485249280929565
Validation loss = 0.024969009682536125
Validation loss = 0.0196891687810421
Validation loss = 0.018005462363362312
Validation loss = 0.016099147498607635
Validation loss = 0.015126438811421394
Validation loss = 0.018785303458571434
Validation loss = 0.014340260997414589
Validation loss = 0.019208097830414772
Validation loss = 0.014317248947918415
Validation loss = 0.012570877559483051
Validation loss = 0.013213414698839188
Validation loss = 0.011834080331027508
Validation loss = 0.013787268660962582
Validation loss = 0.014207188971340656
Validation loss = 0.01954706199467182
Validation loss = 0.013757080771028996
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04674959182739258
Validation loss = 0.024878766387701035
Validation loss = 0.015505878254771233
Validation loss = 0.018429288640618324
Validation loss = 0.01749238185584545
Validation loss = 0.013471497222781181
Validation loss = 0.014217465184628963
Validation loss = 0.012842047959566116
Validation loss = 0.020646462216973305
Validation loss = 0.012510634958744049
Validation loss = 0.013839502818882465
Validation loss = 0.020440299063920975
Validation loss = 0.014132671058177948
Validation loss = 0.012941512279212475
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00123 |
| Iteration     | 4        |
| MaximumReturn | -0.00103 |
| MinimumReturn | -0.00178 |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024684607982635498
Validation loss = 0.023910608142614365
Validation loss = 0.01462353765964508
Validation loss = 0.014527400024235249
Validation loss = 0.012534147128462791
Validation loss = 0.012103075161576271
Validation loss = 0.01226811669766903
Validation loss = 0.013788108713924885
Validation loss = 0.015424574725329876
Validation loss = 0.012303926981985569
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025083398446440697
Validation loss = 0.018208447843790054
Validation loss = 0.01908089779317379
Validation loss = 0.02177242562174797
Validation loss = 0.017347248271107674
Validation loss = 0.012336676940321922
Validation loss = 0.012266267091035843
Validation loss = 0.01248551532626152
Validation loss = 0.012000798247754574
Validation loss = 0.010556833818554878
Validation loss = 0.010638626292347908
Validation loss = 0.016282258555293083
Validation loss = 0.010553950443863869
Validation loss = 0.017409726977348328
Validation loss = 0.012252101674675941
Validation loss = 0.014061075635254383
Validation loss = 0.008938433602452278
Validation loss = 0.011470476165413857
Validation loss = 0.015251430682837963
Validation loss = 0.011612804606556892
Validation loss = 0.01161106862127781
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04059920087456703
Validation loss = 0.023166518658399582
Validation loss = 0.019926762208342552
Validation loss = 0.01734880730509758
Validation loss = 0.014595650136470795
Validation loss = 0.014405342750251293
Validation loss = 0.017484724521636963
Validation loss = 0.015343448147177696
Validation loss = 0.01113365963101387
Validation loss = 0.01171200443059206
Validation loss = 0.016530683264136314
Validation loss = 0.012523876503109932
Validation loss = 0.010588840581476688
Validation loss = 0.015546475537121296
Validation loss = 0.013302301988005638
Validation loss = 0.016567762941122055
Validation loss = 0.01100880280137062
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.038246072828769684
Validation loss = 0.024242956191301346
Validation loss = 0.01220834068953991
Validation loss = 0.01279142964631319
Validation loss = 0.016436032950878143
Validation loss = 0.012727978639304638
Validation loss = 0.0160888209939003
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03332432731986046
Validation loss = 0.017880190163850784
Validation loss = 0.018993038684129715
Validation loss = 0.016685809940099716
Validation loss = 0.01510657835751772
Validation loss = 0.013122225180268288
Validation loss = 0.01118733361363411
Validation loss = 0.014634472317993641
Validation loss = 0.0166279636323452
Validation loss = 0.014508532360196114
Validation loss = 0.015310488641262054
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.33    |
| Iteration     | 5        |
| MaximumReturn | -0.0387  |
| MinimumReturn | -12      |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.031293418258428574
Validation loss = 0.015790503472089767
Validation loss = 0.016154088079929352
Validation loss = 0.0164837297052145
Validation loss = 0.012608179822564125
Validation loss = 0.02000870741903782
Validation loss = 0.013024087063968182
Validation loss = 0.01020645722746849
Validation loss = 0.00959168653935194
Validation loss = 0.013494784943759441
Validation loss = 0.010774542577564716
Validation loss = 0.01594168320298195
Validation loss = 0.011196482926607132
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04139803349971771
Validation loss = 0.014490371569991112
Validation loss = 0.012179301120340824
Validation loss = 0.014415318146348
Validation loss = 0.014210867695510387
Validation loss = 0.008894199505448341
Validation loss = 0.008895494043827057
Validation loss = 0.010347452946007252
Validation loss = 0.019717490300536156
Validation loss = 0.012101870030164719
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023049872368574142
Validation loss = 0.021939914673566818
Validation loss = 0.011616966687142849
Validation loss = 0.011465277522802353
Validation loss = 0.011304548941552639
Validation loss = 0.014324414543807507
Validation loss = 0.012840263545513153
Validation loss = 0.011358842253684998
Validation loss = 0.008872507140040398
Validation loss = 0.008868148550391197
Validation loss = 0.009971029125154018
Validation loss = 0.011739793233573437
Validation loss = 0.012147304601967335
Validation loss = 0.009625476785004139
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021320609375834465
Validation loss = 0.013146573677659035
Validation loss = 0.010785474441945553
Validation loss = 0.014112819917500019
Validation loss = 0.016855161637067795
Validation loss = 0.011591742746531963
Validation loss = 0.013742799870669842
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.027638500556349754
Validation loss = 0.015957459807395935
Validation loss = 0.01506495475769043
Validation loss = 0.013866713270545006
Validation loss = 0.01591087505221367
Validation loss = 0.01071873027831316
Validation loss = 0.009242075495421886
Validation loss = 0.010619648732244968
Validation loss = 0.012568664737045765
Validation loss = 0.011326903477311134
Validation loss = 0.01199895329773426
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.1    |
| Iteration     | 6        |
| MaximumReturn | -0.222   |
| MinimumReturn | -43.3    |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04107889533042908
Validation loss = 0.016173087060451508
Validation loss = 0.016629720106720924
Validation loss = 0.013247724622488022
Validation loss = 0.011340559460222721
Validation loss = 0.008751265704631805
Validation loss = 0.010465230792760849
Validation loss = 0.009321226738393307
Validation loss = 0.010753835551440716
Validation loss = 0.00785581674426794
Validation loss = 0.00930037908256054
Validation loss = 0.0072745284996926785
Validation loss = 0.008011738769710064
Validation loss = 0.007329890504479408
Validation loss = 0.009744667448103428
Validation loss = 0.0063945273868739605
Validation loss = 0.0075426832772791386
Validation loss = 0.010335403494536877
Validation loss = 0.0071721128188073635
Validation loss = 0.007393281906843185
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04826496168971062
Validation loss = 0.017231570556759834
Validation loss = 0.012656417675316334
Validation loss = 0.016834933310747147
Validation loss = 0.020401502028107643
Validation loss = 0.009109432809054852
Validation loss = 0.00707206642255187
Validation loss = 0.006586107891052961
Validation loss = 0.010031218640506268
Validation loss = 0.016309497877955437
Validation loss = 0.008769533596932888
Validation loss = 0.013027689419686794
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05933426320552826
Validation loss = 0.014397642575204372
Validation loss = 0.01796598732471466
Validation loss = 0.01247043814510107
Validation loss = 0.012073908932507038
Validation loss = 0.01240818202495575
Validation loss = 0.010870788246393204
Validation loss = 0.009886818937957287
Validation loss = 0.00760206812992692
Validation loss = 0.010508254170417786
Validation loss = 0.011615808121860027
Validation loss = 0.012699447572231293
Validation loss = 0.007876679301261902
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03915882483124733
Validation loss = 0.013127314858138561
Validation loss = 0.011334252543747425
Validation loss = 0.014190450310707092
Validation loss = 0.014689593575894833
Validation loss = 0.014331807382404804
Validation loss = 0.010498112998902798
Validation loss = 0.008344899863004684
Validation loss = 0.013907800428569317
Validation loss = 0.015146308578550816
Validation loss = 0.009297695942223072
Validation loss = 0.009704086929559708
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0382806658744812
Validation loss = 0.018879421055316925
Validation loss = 0.013045224361121655
Validation loss = 0.009671522304415703
Validation loss = 0.011273634620010853
Validation loss = 0.012794039212167263
Validation loss = 0.015407338738441467
Validation loss = 0.019034987315535545
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0121  |
| Iteration     | 7        |
| MaximumReturn | -0.00863 |
| MinimumReturn | -0.0144  |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014875571243464947
Validation loss = 0.010705718770623207
Validation loss = 0.00763171911239624
Validation loss = 0.006867408752441406
Validation loss = 0.00817993376404047
Validation loss = 0.00900255423039198
Validation loss = 0.008353512734174728
Validation loss = 0.007170552853494883
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016246039420366287
Validation loss = 0.02195981703698635
Validation loss = 0.00881808903068304
Validation loss = 0.010170841589570045
Validation loss = 0.0076268636621534824
Validation loss = 0.011521900072693825
Validation loss = 0.009711997583508492
Validation loss = 0.008135981857776642
Validation loss = 0.012149148620665073
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018393510952591896
Validation loss = 0.012006404809653759
Validation loss = 0.013847271911799908
Validation loss = 0.006565764546394348
Validation loss = 0.014278091490268707
Validation loss = 0.00860520638525486
Validation loss = 0.007141287438571453
Validation loss = 0.009213896468281746
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02369595132768154
Validation loss = 0.01068097073584795
Validation loss = 0.012030149810016155
Validation loss = 0.006442113779485226
Validation loss = 0.01179924514144659
Validation loss = 0.01011074148118496
Validation loss = 0.009919549338519573
Validation loss = 0.0077290525659918785
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021957555785775185
Validation loss = 0.01298766490072012
Validation loss = 0.009688825346529484
Validation loss = 0.009138273075222969
Validation loss = 0.008736019022762775
Validation loss = 0.009228560142219067
Validation loss = 0.009455348365008831
Validation loss = 0.015138396061956882
Validation loss = 0.009341088123619556
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0379  |
| Iteration     | 8        |
| MaximumReturn | -0.0264  |
| MinimumReturn | -0.108   |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015349035151302814
Validation loss = 0.009245924651622772
Validation loss = 0.007465239614248276
Validation loss = 0.010245135053992271
Validation loss = 0.007366373669356108
Validation loss = 0.005963154602795839
Validation loss = 0.016378294676542282
Validation loss = 0.00882813148200512
Validation loss = 0.008304831571877003
Validation loss = 0.006929436232894659
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012717563658952713
Validation loss = 0.009822316467761993
Validation loss = 0.007826665416359901
Validation loss = 0.006620537489652634
Validation loss = 0.005848743952810764
Validation loss = 0.010180370882153511
Validation loss = 0.006122680846601725
Validation loss = 0.009498878382146358
Validation loss = 0.01467200554907322
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012770799919962883
Validation loss = 0.015239416621625423
Validation loss = 0.009240344166755676
Validation loss = 0.010911239311099052
Validation loss = 0.0065498948097229
Validation loss = 0.006577461026608944
Validation loss = 0.00684050377458334
Validation loss = 0.007790500298142433
Validation loss = 0.007741251960396767
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018606074154376984
Validation loss = 0.00992545485496521
Validation loss = 0.0073479157872498035
Validation loss = 0.010516459122300148
Validation loss = 0.00772206112742424
Validation loss = 0.009407739154994488
Validation loss = 0.009448292665183544
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008760826662182808
Validation loss = 0.006424626335501671
Validation loss = 0.008535699918866158
Validation loss = 0.006712499540299177
Validation loss = 0.008056660182774067
Validation loss = 0.006944575347006321
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -39.3    |
| Iteration     | 9        |
| MaximumReturn | -0.0455  |
| MinimumReturn | -97.1    |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015696365386247635
Validation loss = 0.013516009785234928
Validation loss = 0.007467253133654594
Validation loss = 0.005550498608499765
Validation loss = 0.007857438176870346
Validation loss = 0.006503961514681578
Validation loss = 0.005531969014555216
Validation loss = 0.008986189030110836
Validation loss = 0.006047443021088839
Validation loss = 0.004461253527551889
Validation loss = 0.006609106436371803
Validation loss = 0.004569550976157188
Validation loss = 0.005801679566502571
Validation loss = 0.004189709201455116
Validation loss = 0.005144944414496422
Validation loss = 0.013627472333610058
Validation loss = 0.006178499665111303
Validation loss = 0.004635777324438095
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012243997305631638
Validation loss = 0.011838700622320175
Validation loss = 0.011785689741373062
Validation loss = 0.005989636294543743
Validation loss = 0.0063860309310257435
Validation loss = 0.003847637213766575
Validation loss = 0.004320378880947828
Validation loss = 0.007687483448535204
Validation loss = 0.0042984019964933395
Validation loss = 0.005247367545962334
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019060499966144562
Validation loss = 0.008309411816298962
Validation loss = 0.009717468172311783
Validation loss = 0.006266091484576464
Validation loss = 0.004452576395124197
Validation loss = 0.005548613145947456
Validation loss = 0.00947448331862688
Validation loss = 0.005307961720973253
Validation loss = 0.005350357852876186
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016128050163388252
Validation loss = 0.01091119833290577
Validation loss = 0.0067559415474534035
Validation loss = 0.006235030945390463
Validation loss = 0.007597814779728651
Validation loss = 0.006731016561388969
Validation loss = 0.00578884594142437
Validation loss = 0.006241225637495518
Validation loss = 0.005081371404230595
Validation loss = 0.004860902205109596
Validation loss = 0.005269855726510286
Validation loss = 0.006446586921811104
Validation loss = 0.005696197506040335
Validation loss = 0.004709064029157162
Validation loss = 0.009336499497294426
Validation loss = 0.006821111775934696
Validation loss = 0.00538691645488143
Validation loss = 0.005183082073926926
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017380373552441597
Validation loss = 0.013055435381829739
Validation loss = 0.009941350668668747
Validation loss = 0.006559520494192839
Validation loss = 0.008232099935412407
Validation loss = 0.004995227791368961
Validation loss = 0.0046157026663422585
Validation loss = 0.00547379907220602
Validation loss = 0.004522685892879963
Validation loss = 0.004898129031062126
Validation loss = 0.004629053641110659
Validation loss = 0.005227210000157356
Validation loss = 0.007436004001647234
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0159  |
| Iteration     | 10       |
| MaximumReturn | -0.0135  |
| MinimumReturn | -0.0201  |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009547010064125061
Validation loss = 0.006803295109421015
Validation loss = 0.0048757558688521385
Validation loss = 0.0035238012205809355
Validation loss = 0.004479417577385902
Validation loss = 0.00818437896668911
Validation loss = 0.004877261817455292
Validation loss = 0.0038587942253798246
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006795462220907211
Validation loss = 0.005080080591142178
Validation loss = 0.005652966909110546
Validation loss = 0.004635713063180447
Validation loss = 0.009641619399189949
Validation loss = 0.007358124945312738
Validation loss = 0.005784410517662764
Validation loss = 0.005832625087350607
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008899452164769173
Validation loss = 0.008550374768674374
Validation loss = 0.007437358610332012
Validation loss = 0.00543942442163825
Validation loss = 0.004182063974440098
Validation loss = 0.005373440217226744
Validation loss = 0.007242489606142044
Validation loss = 0.005120072979480028
Validation loss = 0.003962801769375801
Validation loss = 0.00585126830264926
Validation loss = 0.0053498027846217155
Validation loss = 0.006579632870852947
Validation loss = 0.008499923162162304
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008254220709204674
Validation loss = 0.005105278920382261
Validation loss = 0.00497429771348834
Validation loss = 0.006584747228771448
Validation loss = 0.004857265390455723
Validation loss = 0.007363977842032909
Validation loss = 0.006699257530272007
Validation loss = 0.008013561367988586
Validation loss = 0.007288293447345495
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01712447963654995
Validation loss = 0.00556517019867897
Validation loss = 0.005351346917450428
Validation loss = 0.004368412774056196
Validation loss = 0.006808658596128225
Validation loss = 0.004746611230075359
Validation loss = 0.005781698040664196
Validation loss = 0.005138691049069166
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00332  |
| Iteration     | 11        |
| MaximumReturn | -0.000698 |
| MinimumReturn | -0.0431   |
| TotalSamples  | 21658     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008870154619216919
Validation loss = 0.00605261605232954
Validation loss = 0.016674835234880447
Validation loss = 0.00659102201461792
Validation loss = 0.005207920912653208
Validation loss = 0.004639404825866222
Validation loss = 0.005375286564230919
Validation loss = 0.0053347875364124775
Validation loss = 0.004729201085865498
Validation loss = 0.00499830162152648
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007689418736845255
Validation loss = 0.004665592685341835
Validation loss = 0.010513786226511002
Validation loss = 0.006677359342575073
Validation loss = 0.011555610224604607
Validation loss = 0.0049451314844191074
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011313563212752342
Validation loss = 0.0056543415412306786
Validation loss = 0.00510701909661293
Validation loss = 0.005739004351198673
Validation loss = 0.004232802428305149
Validation loss = 0.005098394118249416
Validation loss = 0.0047692544758319855
Validation loss = 0.00458268728107214
Validation loss = 0.010229279287159443
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007295537739992142
Validation loss = 0.012076091021299362
Validation loss = 0.008051888085901737
Validation loss = 0.007948709651827812
Validation loss = 0.006446771323680878
Validation loss = 0.005064954049885273
Validation loss = 0.007738103158771992
Validation loss = 0.005440244451165199
Validation loss = 0.007654258515685797
Validation loss = 0.007168469484895468
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009756443090736866
Validation loss = 0.0061772605404257774
Validation loss = 0.00574216153472662
Validation loss = 0.00596938282251358
Validation loss = 0.005001156125217676
Validation loss = 0.003893433604389429
Validation loss = 0.003808022942394018
Validation loss = 0.009237299673259258
Validation loss = 0.006738905794918537
Validation loss = 0.006062836851924658
Validation loss = 0.007092852145433426
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.18    |
| Iteration     | 12       |
| MaximumReturn | -0.0336  |
| MinimumReturn | -1.13    |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01284149382263422
Validation loss = 0.005167991854250431
Validation loss = 0.0033784739207476377
Validation loss = 0.00450033089146018
Validation loss = 0.003937787842005491
Validation loss = 0.0039898985996842384
Validation loss = 0.0062192450277507305
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006402249448001385
Validation loss = 0.005740308668464422
Validation loss = 0.004562185145914555
Validation loss = 0.004334664437919855
Validation loss = 0.0033305236138403416
Validation loss = 0.0062782918103039265
Validation loss = 0.00630215834826231
Validation loss = 0.0031374855898320675
Validation loss = 0.005857862997800112
Validation loss = 0.0035049316938966513
Validation loss = 0.004460637923330069
Validation loss = 0.006725925486534834
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010419429279863834
Validation loss = 0.005806928966194391
Validation loss = 0.004049689043313265
Validation loss = 0.009282169863581657
Validation loss = 0.0035216272808611393
Validation loss = 0.00447872607037425
Validation loss = 0.004563243128359318
Validation loss = 0.004024472553282976
Validation loss = 0.004015171900391579
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013586568646132946
Validation loss = 0.005452257581055164
Validation loss = 0.005101610440760851
Validation loss = 0.007756166625767946
Validation loss = 0.004362873267382383
Validation loss = 0.0036940784193575382
Validation loss = 0.004758007358759642
Validation loss = 0.004042294342070818
Validation loss = 0.004236927255988121
Validation loss = 0.006964496802538633
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008260991424322128
Validation loss = 0.004391152877360582
Validation loss = 0.005085186567157507
Validation loss = 0.004747847095131874
Validation loss = 0.004795962944626808
Validation loss = 0.0031694427598267794
Validation loss = 0.0032507760915905237
Validation loss = 0.00704573281109333
Validation loss = 0.0046445694752037525
Validation loss = 0.0050111678428947926
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00332 |
| Iteration     | 13       |
| MaximumReturn | -0.00119 |
| MinimumReturn | -0.0126  |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005686914548277855
Validation loss = 0.0038722939789295197
Validation loss = 0.00448869913816452
Validation loss = 0.005444427486509085
Validation loss = 0.00487853679805994
Validation loss = 0.004309242591261864
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006412440445274115
Validation loss = 0.005265495274215937
Validation loss = 0.006761230994015932
Validation loss = 0.006051142234355211
Validation loss = 0.005335191730409861
Validation loss = 0.00783286802470684
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012544558383524418
Validation loss = 0.005191501695662737
Validation loss = 0.004448606166988611
Validation loss = 0.0041587138548493385
Validation loss = 0.003256151219829917
Validation loss = 0.0029403676744550467
Validation loss = 0.004689179826527834
Validation loss = 0.0032726803328841925
Validation loss = 0.002592857228592038
Validation loss = 0.006975548341870308
Validation loss = 0.006046140566468239
Validation loss = 0.003522951388731599
Validation loss = 0.005013314541429281
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00743141071870923
Validation loss = 0.003392681246623397
Validation loss = 0.003549261949956417
Validation loss = 0.00344720552675426
Validation loss = 0.003713954472914338
Validation loss = 0.009733240120112896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005632973741739988
Validation loss = 0.004135743714869022
Validation loss = 0.003825328079983592
Validation loss = 0.0035724248737096786
Validation loss = 0.007870781235396862
Validation loss = 0.0036544958129525185
Validation loss = 0.0028911950066685677
Validation loss = 0.0049248323775827885
Validation loss = 0.004373519215732813
Validation loss = 0.004280625376850367
Validation loss = 0.0037768061738461256
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0179  |
| Iteration     | 14       |
| MaximumReturn | -0.01    |
| MinimumReturn | -0.0305  |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0058476305566728115
Validation loss = 0.005074853077530861
Validation loss = 0.0031207995489239693
Validation loss = 0.0022586204577237368
Validation loss = 0.004446771927177906
Validation loss = 0.004012350458651781
Validation loss = 0.0037253687623888254
Validation loss = 0.004713268019258976
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005134743172675371
Validation loss = 0.002929213223978877
Validation loss = 0.0036582027096301317
Validation loss = 0.003969643730670214
Validation loss = 0.0025568159762769938
Validation loss = 0.0028521863278001547
Validation loss = 0.0029655606485903263
Validation loss = 0.0024439445696771145
Validation loss = 0.004727062303572893
Validation loss = 0.00341753545217216
Validation loss = 0.0038552586920559406
Validation loss = 0.00388714880682528
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007607377599924803
Validation loss = 0.0054697091691195965
Validation loss = 0.002505955519154668
Validation loss = 0.0085065346211195
Validation loss = 0.0060856225900352
Validation loss = 0.0034064960200339556
Validation loss = 0.0027932191733270884
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007272734306752682
Validation loss = 0.003498444566503167
Validation loss = 0.005505146458745003
Validation loss = 0.003158393781632185
Validation loss = 0.0036591063253581524
Validation loss = 0.00238969549536705
Validation loss = 0.002752679865807295
Validation loss = 0.0032024530228227377
Validation loss = 0.003108670236542821
Validation loss = 0.009793729521334171
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003830521134659648
Validation loss = 0.005467369686812162
Validation loss = 0.00516502745449543
Validation loss = 0.004137761425226927
Validation loss = 0.005284029059112072
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.55    |
| Iteration     | 15       |
| MaximumReturn | -0.0345  |
| MinimumReturn | -36.5    |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006457580719143152
Validation loss = 0.0024536028504371643
Validation loss = 0.002728621941059828
Validation loss = 0.0019098635530099273
Validation loss = 0.005843593273311853
Validation loss = 0.0025757583789527416
Validation loss = 0.0021259388886392117
Validation loss = 0.00237502483651042
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005531542003154755
Validation loss = 0.0027150947134941816
Validation loss = 0.002844300353899598
Validation loss = 0.0023722671903669834
Validation loss = 0.0031941242050379515
Validation loss = 0.002924746135249734
Validation loss = 0.004154481925070286
Validation loss = 0.0029002088122069836
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002671115566045046
Validation loss = 0.0024461497087031603
Validation loss = 0.0023224432952702045
Validation loss = 0.002470575738698244
Validation loss = 0.002641397062689066
Validation loss = 0.0033655930310487747
Validation loss = 0.0030842723790556192
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006153348833322525
Validation loss = 0.0036321303341537714
Validation loss = 0.003283900674432516
Validation loss = 0.0022247135639190674
Validation loss = 0.002385938772931695
Validation loss = 0.002895029028877616
Validation loss = 0.0028831004165112972
Validation loss = 0.005074110813438892
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004923676140606403
Validation loss = 0.0028993531595915556
Validation loss = 0.0026523470878601074
Validation loss = 0.0027249306440353394
Validation loss = 0.00197412702254951
Validation loss = 0.004301063716411591
Validation loss = 0.0040205856785178185
Validation loss = 0.0026622063014656305
Validation loss = 0.0028123275842517614
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0293  |
| Iteration     | 16       |
| MaximumReturn | -0.00696 |
| MinimumReturn | -0.247   |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0029616744723170996
Validation loss = 0.003016817383468151
Validation loss = 0.002825808711349964
Validation loss = 0.002529478631913662
Validation loss = 0.008629842661321163
Validation loss = 0.002362854778766632
Validation loss = 0.0027948031201958656
Validation loss = 0.0015809654723852873
Validation loss = 0.0025803488679230213
Validation loss = 0.0029152813367545605
Validation loss = 0.0025481691118329763
Validation loss = 0.0037672631442546844
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004208590369671583
Validation loss = 0.004597445018589497
Validation loss = 0.002039816929027438
Validation loss = 0.0030007443856447935
Validation loss = 0.002637928118929267
Validation loss = 0.004108435474336147
Validation loss = 0.002550506731495261
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004095195326954126
Validation loss = 0.0033753870520740747
Validation loss = 0.004208037164062262
Validation loss = 0.003622853197157383
Validation loss = 0.0026551883202046156
Validation loss = 0.003134941915050149
Validation loss = 0.0032142172567546368
Validation loss = 0.0046862708404660225
Validation loss = 0.002469694707542658
Validation loss = 0.002011028118431568
Validation loss = 0.0032078721560537815
Validation loss = 0.0022771426010876894
Validation loss = 0.002855972619727254
Validation loss = 0.0031640881206840277
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004607741255313158
Validation loss = 0.0035237923730164766
Validation loss = 0.0020153005607426167
Validation loss = 0.0027296175248920918
Validation loss = 0.0029420938808470964
Validation loss = 0.002539547858759761
Validation loss = 0.0034975637681782246
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007406456395983696
Validation loss = 0.0029221693985164165
Validation loss = 0.001966098789125681
Validation loss = 0.0025589026045054197
Validation loss = 0.0025096542667597532
Validation loss = 0.0043462589383125305
Validation loss = 0.003320035059005022
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00085  |
| Iteration     | 17        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.00112  |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005879723466932774
Validation loss = 0.0022510357666760683
Validation loss = 0.006891587283462286
Validation loss = 0.0035083272960036993
Validation loss = 0.0023667695932090282
Validation loss = 0.003057979978621006
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0028986099641770124
Validation loss = 0.004375741351395845
Validation loss = 0.0033290511928498745
Validation loss = 0.0021320278756320477
Validation loss = 0.0022505929227918386
Validation loss = 0.002284316113218665
Validation loss = 0.001865968108177185
Validation loss = 0.0023393742740154266
Validation loss = 0.0028444856870919466
Validation loss = 0.0045480853877961636
Validation loss = 0.0031902976334095
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0033514455426484346
Validation loss = 0.0027856489177793264
Validation loss = 0.004177774768322706
Validation loss = 0.006062197033315897
Validation loss = 0.002282000845298171
Validation loss = 0.0028665654826909304
Validation loss = 0.003393821883946657
Validation loss = 0.003252361435443163
Validation loss = 0.003424182068556547
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00876818411052227
Validation loss = 0.003449447685852647
Validation loss = 0.006193612236529589
Validation loss = 0.005036158487200737
Validation loss = 0.0030800020322203636
Validation loss = 0.0022556737530976534
Validation loss = 0.0020998497493565083
Validation loss = 0.004219400696456432
Validation loss = 0.0029029895085841417
Validation loss = 0.002043345710262656
Validation loss = 0.004070183262228966
Validation loss = 0.004727559629827738
Validation loss = 0.0037404601462185383
Validation loss = 0.002816206542775035
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00881511066108942
Validation loss = 0.005309819243848324
Validation loss = 0.004368125926703215
Validation loss = 0.002687104046344757
Validation loss = 0.004955449141561985
Validation loss = 0.0036084880121052265
Validation loss = 0.003070030827075243
Validation loss = 0.003373617772012949
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18.9    |
| Iteration     | 18       |
| MaximumReturn | -0.0732  |
| MinimumReturn | -41.1    |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006838364992290735
Validation loss = 0.0025690938346087933
Validation loss = 0.0028235549107193947
Validation loss = 0.0018776528304442763
Validation loss = 0.0032641002908349037
Validation loss = 0.002801764290779829
Validation loss = 0.001832193462178111
Validation loss = 0.0027151035610586405
Validation loss = 0.001404372276738286
Validation loss = 0.001842808909714222
Validation loss = 0.0030659064650535583
Validation loss = 0.00487532140687108
Validation loss = 0.0022331387735903263
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008134842850267887
Validation loss = 0.0015732652973383665
Validation loss = 0.0018877850379794836
Validation loss = 0.0030420618131756783
Validation loss = 0.004472598433494568
Validation loss = 0.0016093223821371794
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005435739178210497
Validation loss = 0.002310737268999219
Validation loss = 0.0033118685241788626
Validation loss = 0.003097647801041603
Validation loss = 0.003371848026290536
Validation loss = 0.0019278695108368993
Validation loss = 0.0018578406888991594
Validation loss = 0.0021367466542869806
Validation loss = 0.0020245322957634926
Validation loss = 0.001777490833774209
Validation loss = 0.0023490325547754765
Validation loss = 0.002217494882643223
Validation loss = 0.0024013807997107506
Validation loss = 0.0016941259382292628
Validation loss = 0.002577511128038168
Validation loss = 0.002410425338894129
Validation loss = 0.0020170575007796288
Validation loss = 0.002529789926484227
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006507806479930878
Validation loss = 0.001769113470800221
Validation loss = 0.001877041650004685
Validation loss = 0.003083278890699148
Validation loss = 0.0036800506059080362
Validation loss = 0.0020076639484614134
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00623018853366375
Validation loss = 0.0030691432766616344
Validation loss = 0.0022802334278821945
Validation loss = 0.0022833056282252073
Validation loss = 0.0024639845360070467
Validation loss = 0.0024074623361229897
Validation loss = 0.0019209228921681643
Validation loss = 0.0024114036932587624
Validation loss = 0.0035707144998013973
Validation loss = 0.004831723403185606
Validation loss = 0.0018780460814014077
Validation loss = 0.002799275564029813
Validation loss = 0.002040116349235177
Validation loss = 0.002931212540715933
Validation loss = 0.0021562688052654266
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.51    |
| Iteration     | 19       |
| MaximumReturn | -0.0188  |
| MinimumReturn | -48.1    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0037785868626087904
Validation loss = 0.002694745548069477
Validation loss = 0.0031226889695972204
Validation loss = 0.004044937435537577
Validation loss = 0.0015423052245751023
Validation loss = 0.0023883525282144547
Validation loss = 0.0017486216966062784
Validation loss = 0.004036422818899155
Validation loss = 0.0016508023254573345
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007063514553010464
Validation loss = 0.0019067784305661917
Validation loss = 0.0020168570335954428
Validation loss = 0.002374832984060049
Validation loss = 0.0021065508481115103
Validation loss = 0.0022797980345785618
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004584847018122673
Validation loss = 0.004198105540126562
Validation loss = 0.002670818008482456
Validation loss = 0.0036632854025810957
Validation loss = 0.0029634612146764994
Validation loss = 0.00252632237970829
Validation loss = 0.0019106470281258225
Validation loss = 0.0013383551267907023
Validation loss = 0.0034909609239548445
Validation loss = 0.0016825994243845344
Validation loss = 0.001959368120878935
Validation loss = 0.0013952512526884675
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004628508351743221
Validation loss = 0.001587612321600318
Validation loss = 0.0017984630540013313
Validation loss = 0.0030110145453363657
Validation loss = 0.002091814996674657
Validation loss = 0.001418207073584199
Validation loss = 0.0021154191344976425
Validation loss = 0.002144895028322935
Validation loss = 0.0022868202067911625
Validation loss = 0.0020838978234678507
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023301877081394196
Validation loss = 0.0019420308526605368
Validation loss = 0.002707541221752763
Validation loss = 0.0015139979077503085
Validation loss = 0.0015695387264713645
Validation loss = 0.002300345106050372
Validation loss = 0.002625927794724703
Validation loss = 0.001748782116919756
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000796 |
| Iteration     | 20        |
| MaximumReturn | -0.00064  |
| MinimumReturn | -0.00105  |
| TotalSamples  | 36652     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0030857929959893227
Validation loss = 0.002629046095535159
Validation loss = 0.0032342325430363417
Validation loss = 0.007877340540289879
Validation loss = 0.0028174761682748795
Validation loss = 0.002053122501820326
Validation loss = 0.00255877454765141
Validation loss = 0.0027900233399122953
Validation loss = 0.004108215216547251
Validation loss = 0.0015236621256917715
Validation loss = 0.0025779693387448788
Validation loss = 0.001459802151657641
Validation loss = 0.0016799881123006344
Validation loss = 0.0019011168042197824
Validation loss = 0.0022869564127177
Validation loss = 0.002979642478749156
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0030507505871355534
Validation loss = 0.0028054260183125734
Validation loss = 0.0032883163075894117
Validation loss = 0.00316635868512094
Validation loss = 0.003089078003540635
Validation loss = 0.0024937097914516926
Validation loss = 0.0021988386288285255
Validation loss = 0.004158550873398781
Validation loss = 0.0015992172993719578
Validation loss = 0.0025560997892171144
Validation loss = 0.004689555149525404
Validation loss = 0.002063295105472207
Validation loss = 0.0028769439086318016
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0026871238369494677
Validation loss = 0.002917190780863166
Validation loss = 0.0024162805639207363
Validation loss = 0.0045868814922869205
Validation loss = 0.0022297410760074854
Validation loss = 0.0017524249851703644
Validation loss = 0.0016066646203398705
Validation loss = 0.0019921802449971437
Validation loss = 0.005225041881203651
Validation loss = 0.0018971024546772242
Validation loss = 0.002322773914784193
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004586818162351847
Validation loss = 0.002080898731946945
Validation loss = 0.002171542728319764
Validation loss = 0.0019751861691474915
Validation loss = 0.002299533225595951
Validation loss = 0.002384675433859229
Validation loss = 0.0016770202200859785
Validation loss = 0.001544886501505971
Validation loss = 0.002438378520309925
Validation loss = 0.0021543207112699747
Validation loss = 0.0033289603888988495
Validation loss = 0.003689040429890156
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0036649289540946484
Validation loss = 0.005340362899005413
Validation loss = 0.004366443958133459
Validation loss = 0.0029490033630281687
Validation loss = 0.001909931655973196
Validation loss = 0.002080014208331704
Validation loss = 0.002901376923546195
Validation loss = 0.002351833274587989
Validation loss = 0.0022860632743686438
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.182    |
| Iteration     | 21        |
| MaximumReturn | -0.000796 |
| MinimumReturn | -2.99     |
| TotalSamples  | 38318     |
-----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002352943178266287
Validation loss = 0.0016536822076886892
Validation loss = 0.0028623079415410757
Validation loss = 0.0013183383271098137
Validation loss = 0.001402272842824459
Validation loss = 0.002541500376537442
Validation loss = 0.002859214786440134
Validation loss = 0.001709560863673687
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004264267161488533
Validation loss = 0.0015067624626681209
Validation loss = 0.0017568537732586265
Validation loss = 0.002888968214392662
Validation loss = 0.0018962784670293331
Validation loss = 0.0025572162121534348
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0029414016753435135
Validation loss = 0.0019357400014996529
Validation loss = 0.002612440614029765
Validation loss = 0.0018011674983426929
Validation loss = 0.0021849533077329397
Validation loss = 0.0020649393554776907
Validation loss = 0.004019090440124273
Validation loss = 0.002633997704833746
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004955369513481855
Validation loss = 0.00251086987555027
Validation loss = 0.001847851206548512
Validation loss = 0.0033578507136553526
Validation loss = 0.00232341093942523
Validation loss = 0.004132801666855812
Validation loss = 0.0017489459132775664
Validation loss = 0.0014579962007701397
Validation loss = 0.002706169383600354
Validation loss = 0.005342473275959492
Validation loss = 0.0023633691016584635
Validation loss = 0.0022819226142019033
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0024573784321546555
Validation loss = 0.004764480981975794
Validation loss = 0.00328368553891778
Validation loss = 0.004127379972487688
Validation loss = 0.0019449826795607805
Validation loss = 0.004474290180951357
Validation loss = 0.0014616401167586446
Validation loss = 0.002609889954328537
Validation loss = 0.0018428385956212878
Validation loss = 0.0029625948518514633
Validation loss = 0.0018186363158747554
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.46     |
| Iteration     | 22        |
| MaximumReturn | -0.000625 |
| MinimumReturn | -5.99     |
| TotalSamples  | 39984     |
-----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00458641117438674
Validation loss = 0.002244402188807726
Validation loss = 0.0031997126061469316
Validation loss = 0.0020059500820934772
Validation loss = 0.002663393272086978
Validation loss = 0.00190544908400625
Validation loss = 0.0018173111602663994
Validation loss = 0.0020457208156585693
Validation loss = 0.0020406669937074184
Validation loss = 0.001660465612076223
Validation loss = 0.003091482911258936
Validation loss = 0.002183919306844473
Validation loss = 0.001724790083244443
Validation loss = 0.0031876906286925077
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001619986491277814
Validation loss = 0.0019553862512111664
Validation loss = 0.0014535767259076238
Validation loss = 0.0022118582855910063
Validation loss = 0.0023044198751449585
Validation loss = 0.0021847044117748737
Validation loss = 0.00142260966822505
Validation loss = 0.0023848721757531166
Validation loss = 0.0016015978762879968
Validation loss = 0.002783010946586728
Validation loss = 0.00273143476806581
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002039246493950486
Validation loss = 0.001882102689705789
Validation loss = 0.0025325194001197815
Validation loss = 0.0033986784983426332
Validation loss = 0.005758548155426979
Validation loss = 0.0017685843631625175
Validation loss = 0.0014300844632089138
Validation loss = 0.0020333142019808292
Validation loss = 0.0015656823525205255
Validation loss = 0.00310354121029377
Validation loss = 0.0024864645674824715
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019992704037576914
Validation loss = 0.0021743657998740673
Validation loss = 0.0014490305911749601
Validation loss = 0.001927349017933011
Validation loss = 0.0015473772073164582
Validation loss = 0.0017341941129416227
Validation loss = 0.002043933141976595
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001621705829165876
Validation loss = 0.001970880199223757
Validation loss = 0.0018998220330104232
Validation loss = 0.0020916841458529234
Validation loss = 0.001514393836259842
Validation loss = 0.007644161581993103
Validation loss = 0.002015798818320036
Validation loss = 0.0020051670726388693
Validation loss = 0.0016554531175643206
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -4.75     |
| Iteration     | 23        |
| MaximumReturn | -0.000456 |
| MinimumReturn | -58       |
| TotalSamples  | 41650     |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0030486267060041428
Validation loss = 0.001853813068009913
Validation loss = 0.0025589687284082174
Validation loss = 0.001842741621658206
Validation loss = 0.0023594878148287535
Validation loss = 0.0018171814735978842
Validation loss = 0.0017381809884682298
Validation loss = 0.0019981018267571926
Validation loss = 0.002878705970942974
Validation loss = 0.0038418315816670656
Validation loss = 0.0010970724979415536
Validation loss = 0.002955958480015397
Validation loss = 0.0014165479224175215
Validation loss = 0.0011535389348864555
Validation loss = 0.001502953702583909
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018330799648538232
Validation loss = 0.004471012856811285
Validation loss = 0.004736422095447779
Validation loss = 0.0015562998596578836
Validation loss = 0.0013774491380900145
Validation loss = 0.0021087536588311195
Validation loss = 0.0025233780033886433
Validation loss = 0.0022147949784994125
Validation loss = 0.0023594223894178867
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0026671565137803555
Validation loss = 0.001562799676321447
Validation loss = 0.002830713987350464
Validation loss = 0.0013333051465451717
Validation loss = 0.00297027500346303
Validation loss = 0.0017229008954018354
Validation loss = 0.0019288950134068727
Validation loss = 0.0022160562220960855
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0022820853628218174
Validation loss = 0.0038034648168832064
Validation loss = 0.002170756459236145
Validation loss = 0.0014901107642799616
Validation loss = 0.0016380883753299713
Validation loss = 0.0025047846138477325
Validation loss = 0.0019286064198240638
Validation loss = 0.002142122481018305
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003015071153640747
Validation loss = 0.002300526015460491
Validation loss = 0.0017097521340474486
Validation loss = 0.0018190896371379495
Validation loss = 0.00186875369399786
Validation loss = 0.001802143407985568
Validation loss = 0.002036155667155981
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.53     |
| Iteration     | 24        |
| MaximumReturn | -0.000917 |
| MinimumReturn | -35.4     |
| TotalSamples  | 43316     |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002538439352065325
Validation loss = 0.0013081866782158613
Validation loss = 0.0016562460223212838
Validation loss = 0.0017966367304325104
Validation loss = 0.001974954968318343
Validation loss = 0.0017671805107966065
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002859509317204356
Validation loss = 0.0018142370972782373
Validation loss = 0.0019763882737606764
Validation loss = 0.002668708097189665
Validation loss = 0.002962111495435238
Validation loss = 0.0021273712627589703
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002035487210378051
Validation loss = 0.0015162809286266565
Validation loss = 0.004300301428884268
Validation loss = 0.002674097428098321
Validation loss = 0.00212232768535614
Validation loss = 0.0018377098022028804
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002596590667963028
Validation loss = 0.0019712026696652174
Validation loss = 0.001754797762259841
Validation loss = 0.005295374896377325
Validation loss = 0.0019392573740333319
Validation loss = 0.0020206475164741278
Validation loss = 0.0016160631785169244
Validation loss = 0.0013080103090032935
Validation loss = 0.0016734451055526733
Validation loss = 0.0020483729895204306
Validation loss = 0.0051506212912499905
Validation loss = 0.0017366940155625343
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00425788015127182
Validation loss = 0.0022550311405211687
Validation loss = 0.0023273383267223835
Validation loss = 0.004233046434819698
Validation loss = 0.0017932361224666238
Validation loss = 0.001748734270222485
Validation loss = 0.0038167787715792656
Validation loss = 0.0016828610096126795
Validation loss = 0.002232369501143694
Validation loss = 0.0016136656049638987
Validation loss = 0.0017152855871245265
Validation loss = 0.0019358345307409763
Validation loss = 0.0027302009984850883
Validation loss = 0.0036093511153012514
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.112    |
| Iteration     | 25        |
| MaximumReturn | -0.000639 |
| MinimumReturn | -1.73     |
| TotalSamples  | 44982     |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002351888921111822
Validation loss = 0.001460354309529066
Validation loss = 0.002552173798903823
Validation loss = 0.0011536149540916085
Validation loss = 0.0011330213164910674
Validation loss = 0.0013177570654079318
Validation loss = 0.0017849658615887165
Validation loss = 0.0024959964212030172
Validation loss = 0.0013856148580089211
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0022372372914105654
Validation loss = 0.0019729146733880043
Validation loss = 0.001815239666029811
Validation loss = 0.0022753989323973656
Validation loss = 0.0015513598918914795
Validation loss = 0.0016854896675795317
Validation loss = 0.002833986422047019
Validation loss = 0.002520731883123517
Validation loss = 0.002821222646161914
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018497903365641832
Validation loss = 0.0015497277490794659
Validation loss = 0.002137847011908889
Validation loss = 0.0032774165738373995
Validation loss = 0.0016489955596625805
Validation loss = 0.0029068670701235533
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001882002572529018
Validation loss = 0.0018817178206518292
Validation loss = 0.0014107800088822842
Validation loss = 0.001994461053982377
Validation loss = 0.0019746136385947466
Validation loss = 0.001537744072265923
Validation loss = 0.002087570494040847
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002926029497757554
Validation loss = 0.001125783659517765
Validation loss = 0.0055408175103366375
Validation loss = 0.001345187658444047
Validation loss = 0.002898159669712186
Validation loss = 0.0019367360509932041
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00425 |
| Iteration     | 26       |
| MaximumReturn | -0.00063 |
| MinimumReturn | -0.016   |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0019586908165365458
Validation loss = 0.0015684792306274176
Validation loss = 0.0014960437547415495
Validation loss = 0.0028274774085730314
Validation loss = 0.0011956440284848213
Validation loss = 0.00437439838424325
Validation loss = 0.002731387736275792
Validation loss = 0.0012142691994085908
Validation loss = 0.0009194008307531476
Validation loss = 0.004233173560351133
Validation loss = 0.00249102502129972
Validation loss = 0.002003513742238283
Validation loss = 0.0017669007647782564
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015963333426043391
Validation loss = 0.0010653992649167776
Validation loss = 0.0019411522662267089
Validation loss = 0.002620647195726633
Validation loss = 0.00189325085375458
Validation loss = 0.0013128173304721713
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002406451152637601
Validation loss = 0.0011906070867553353
Validation loss = 0.004508774261921644
Validation loss = 0.004897566512227058
Validation loss = 0.0016168021829798818
Validation loss = 0.001365801552310586
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003498723264783621
Validation loss = 0.0012868603225797415
Validation loss = 0.0027827073354274035
Validation loss = 0.004427606705576181
Validation loss = 0.0018741728272289038
Validation loss = 0.002028281567618251
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002230735495686531
Validation loss = 0.0019004590576514602
Validation loss = 0.0024135210551321507
Validation loss = 0.0013290898641571403
Validation loss = 0.0032509244047105312
Validation loss = 0.0013519312487915158
Validation loss = 0.002744920551776886
Validation loss = 0.0011853160103783011
Validation loss = 0.0031320450361818075
Validation loss = 0.0017216781852766871
Validation loss = 0.0025183982215821743
Validation loss = 0.0011712261475622654
Validation loss = 0.0014040860114619136
Validation loss = 0.00136822450440377
Validation loss = 0.0015067751519382
Validation loss = 0.001145687885582447
Validation loss = 0.0011478448286652565
Validation loss = 0.002473845612257719
Validation loss = 0.0017479249509051442
Validation loss = 0.00233360449783504
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00439  |
| Iteration     | 27        |
| MaximumReturn | -0.000905 |
| MinimumReturn | -0.0101   |
| TotalSamples  | 48314     |
-----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001111000427044928
Validation loss = 0.0014437154168263078
Validation loss = 0.0011363537050783634
Validation loss = 0.001205223728902638
Validation loss = 0.0009440843132324517
Validation loss = 0.001150163821876049
Validation loss = 0.001082474016584456
Validation loss = 0.0013515935279428959
Validation loss = 0.0016838425071910024
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002172775100916624
Validation loss = 0.0013632726622745395
Validation loss = 0.001716689788736403
Validation loss = 0.0011556638637557626
Validation loss = 0.0020126665476709604
Validation loss = 0.0018979008309543133
Validation loss = 0.0013638179516419768
Validation loss = 0.001821088488213718
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016528066480532289
Validation loss = 0.001519177109003067
Validation loss = 0.0010428722016513348
Validation loss = 0.0012988584348931909
Validation loss = 0.001698940061032772
Validation loss = 0.0019806227646768093
Validation loss = 0.0013241240521892905
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015744691481813788
Validation loss = 0.0009975755820050836
Validation loss = 0.001022005919367075
Validation loss = 0.0013915590243414044
Validation loss = 0.0011570261558517814
Validation loss = 0.0012189499102532864
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016336982371285558
Validation loss = 0.0008107062894850969
Validation loss = 0.0009923229226842523
Validation loss = 0.0012737689539790154
Validation loss = 0.0010946994880214334
Validation loss = 0.0015299307415261865
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00194  |
| Iteration     | 28        |
| MaximumReturn | -0.000563 |
| MinimumReturn | -0.00584  |
| TotalSamples  | 49980     |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018950236262753606
Validation loss = 0.0011844575637951493
Validation loss = 0.0017955376533791423
Validation loss = 0.0020618068519979715
Validation loss = 0.0010750042274594307
Validation loss = 0.0017099627293646336
Validation loss = 0.0010909917764365673
Validation loss = 0.0013015212025493383
Validation loss = 0.0018858109833672643
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0020659484434872866
Validation loss = 0.0017802955117076635
Validation loss = 0.00202691531740129
Validation loss = 0.0019245801959186792
Validation loss = 0.0022129688877612352
Validation loss = 0.0013799192383885384
Validation loss = 0.0027793424669653177
Validation loss = 0.001126874703913927
Validation loss = 0.0013554546749219298
Validation loss = 0.0016427467344328761
Validation loss = 0.0018674839520826936
Validation loss = 0.002442555967718363
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018240228528156877
Validation loss = 0.001708181924186647
Validation loss = 0.001389839919283986
Validation loss = 0.0017921493854373693
Validation loss = 0.0023247436620295048
Validation loss = 0.0012593109859153628
Validation loss = 0.002104104496538639
Validation loss = 0.001493985764682293
Validation loss = 0.0022888570092618465
Validation loss = 0.0013310372596606612
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011278321035206318
Validation loss = 0.001191830146126449
Validation loss = 0.004985222592949867
Validation loss = 0.0026235291734337807
Validation loss = 0.0009537251899018884
Validation loss = 0.001275619026273489
Validation loss = 0.0017175835091620684
Validation loss = 0.0010505178943276405
Validation loss = 0.0019505757372826338
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0021533924154937267
Validation loss = 0.0021596127189695835
Validation loss = 0.0011451649479568005
Validation loss = 0.0011131609790027142
Validation loss = 0.0014286196092143655
Validation loss = 0.003625432960689068
Validation loss = 0.0014704987406730652
Validation loss = 0.0014233908150345087
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.98     |
| Iteration     | 29        |
| MaximumReturn | -0.000588 |
| MinimumReturn | -34.5     |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013197087682783604
Validation loss = 0.0013366026105359197
Validation loss = 0.0025596267078071833
Validation loss = 0.001655551022849977
Validation loss = 0.0010776680428534746
Validation loss = 0.0010216665687039495
Validation loss = 0.0012678403872996569
Validation loss = 0.0006590564735233784
Validation loss = 0.0016185258282348514
Validation loss = 0.001535716000944376
Validation loss = 0.0009477726416662335
Validation loss = 0.0012230364372953773
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0022972687147557735
Validation loss = 0.001272651250474155
Validation loss = 0.002181231277063489
Validation loss = 0.0016318848356604576
Validation loss = 0.0014496974181383848
Validation loss = 0.0014442872488871217
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014840327203273773
Validation loss = 0.002686697756871581
Validation loss = 0.0014898496447131038
Validation loss = 0.0017255080165341496
Validation loss = 0.0014547305181622505
Validation loss = 0.0013539587380364537
Validation loss = 0.0017170116771012545
Validation loss = 0.0011549759656190872
Validation loss = 0.001475182012654841
Validation loss = 0.0014750775881111622
Validation loss = 0.0031179734505712986
Validation loss = 0.0009898064890876412
Validation loss = 0.0014364904491230845
Validation loss = 0.001953273778781295
Validation loss = 0.003377420362085104
Validation loss = 0.0032521914690732956
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016326758777722716
Validation loss = 0.0012952563120052218
Validation loss = 0.0013243269640952349
Validation loss = 0.0010297914268448949
Validation loss = 0.00193073193076998
Validation loss = 0.0020454316399991512
Validation loss = 0.0011625740444287658
Validation loss = 0.0010902491630986333
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016597284702584147
Validation loss = 0.0028391540981829166
Validation loss = 0.0012923452304676175
Validation loss = 0.001763718668371439
Validation loss = 0.0014240616001188755
Validation loss = 0.0015604692744091153
Validation loss = 0.0009839272825047374
Validation loss = 0.0015874668024480343
Validation loss = 0.0013756335247308016
Validation loss = 0.0012760898098349571
Validation loss = 0.001055470434948802
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -65.4    |
| Iteration     | 30       |
| MaximumReturn | -24.2    |
| MinimumReturn | -92.7    |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018748001893982291
Validation loss = 0.0008501270203851163
Validation loss = 0.001163956243544817
Validation loss = 0.0009285895503126085
Validation loss = 0.0014134024968370795
Validation loss = 0.000912463350687176
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003279091091826558
Validation loss = 0.0017755297012627125
Validation loss = 0.001067706965841353
Validation loss = 0.0010685742599889636
Validation loss = 0.0008034557104110718
Validation loss = 0.0013226139126345515
Validation loss = 0.0015221921494230628
Validation loss = 0.0032569186296314
Validation loss = 0.000883183442056179
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003564246464520693
Validation loss = 0.0015406168531626463
Validation loss = 0.001178399776108563
Validation loss = 0.0016726029571145773
Validation loss = 0.0010555292246863246
Validation loss = 0.0011087014572694898
Validation loss = 0.0012670570285990834
Validation loss = 0.0025088840629905462
Validation loss = 0.0012299192603677511
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003676703665405512
Validation loss = 0.0009567501256242394
Validation loss = 0.001861050259321928
Validation loss = 0.0007127897115424275
Validation loss = 0.0009029795764945447
Validation loss = 0.0009759701206348836
Validation loss = 0.0009834431111812592
Validation loss = 0.001171823707409203
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0026783274952322245
Validation loss = 0.0013862898340448737
Validation loss = 0.0010421454207971692
Validation loss = 0.0007904214435257018
Validation loss = 0.001035177381709218
Validation loss = 0.0012853549560531974
Validation loss = 0.0009324109414592385
Validation loss = 0.001915231579914689
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.274    |
| Iteration     | 31        |
| MaximumReturn | -0.000787 |
| MinimumReturn | -6.74     |
| TotalSamples  | 54978     |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014633213868364692
Validation loss = 0.0017708175582811236
Validation loss = 0.0018295890185981989
Validation loss = 0.0014071798650547862
Validation loss = 0.001489278394728899
Validation loss = 0.001671771053224802
Validation loss = 0.0021094728726893663
Validation loss = 0.000867226452101022
Validation loss = 0.0027197543531656265
Validation loss = 0.0011448109289631248
Validation loss = 0.0016242964193224907
Validation loss = 0.000824058020953089
Validation loss = 0.00156736234202981
Validation loss = 0.0009161598281934857
Validation loss = 0.0013369404477998614
Validation loss = 0.000742591277230531
Validation loss = 0.0007042537909001112
Validation loss = 0.0007327026687562466
Validation loss = 0.0015144690405577421
Validation loss = 0.001124221016652882
Validation loss = 0.0011390160070732236
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016240141121670604
Validation loss = 0.0010277035180479288
Validation loss = 0.0008406343404203653
Validation loss = 0.0016410340322181582
Validation loss = 0.0013420350151136518
Validation loss = 0.0011287975357845426
Validation loss = 0.001206131186336279
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013540737563744187
Validation loss = 0.0020470810122787952
Validation loss = 0.002593735931441188
Validation loss = 0.0016662931302562356
Validation loss = 0.0012690010480582714
Validation loss = 0.0023288801312446594
Validation loss = 0.0016646047588437796
Validation loss = 0.002207874320447445
Validation loss = 0.0017484619747847319
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010310597717761993
Validation loss = 0.0028432232793420553
Validation loss = 0.0018629482947289944
Validation loss = 0.001520150457508862
Validation loss = 0.0020908890292048454
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011600912548601627
Validation loss = 0.0011245745699852705
Validation loss = 0.0013687575701624155
Validation loss = 0.0009801950072869658
Validation loss = 0.001842327183112502
Validation loss = 0.0008852735045365989
Validation loss = 0.0015214660670608282
Validation loss = 0.0015784903662279248
Validation loss = 0.001118371495977044
Validation loss = 0.0014275130815804005
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0688   |
| Iteration     | 32        |
| MaximumReturn | -0.000579 |
| MinimumReturn | -1.6      |
| TotalSamples  | 56644     |
-----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009335314971394837
Validation loss = 0.0020059470552951097
Validation loss = 0.0009564812644384801
Validation loss = 0.002827988239005208
Validation loss = 0.0010492581641301513
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011814007302746177
Validation loss = 0.0008542892173863947
Validation loss = 0.0021570392418652773
Validation loss = 0.0014658559812232852
Validation loss = 0.0011921474942937493
Validation loss = 0.001372536993585527
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016738713020458817
Validation loss = 0.0014001113595440984
Validation loss = 0.0013140894006937742
Validation loss = 0.001702764187939465
Validation loss = 0.0011022996623069048
Validation loss = 0.0015855183592066169
Validation loss = 0.0007590524037368596
Validation loss = 0.001466933055780828
Validation loss = 0.0015516114654019475
Validation loss = 0.0014463443076238036
Validation loss = 0.0008893838967196643
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012571943225339055
Validation loss = 0.0015305926790460944
Validation loss = 0.0009924920741468668
Validation loss = 0.0011569379130378366
Validation loss = 0.0008817677153274417
Validation loss = 0.0011252394178882241
Validation loss = 0.0009588057873770595
Validation loss = 0.0015991029795259237
Validation loss = 0.0010083086090162396
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009827337926253676
Validation loss = 0.0009750144672580063
Validation loss = 0.0011954669607803226
Validation loss = 0.001320834388025105
Validation loss = 0.0014062935952097178
Validation loss = 0.002262728288769722
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0744   |
| Iteration     | 33        |
| MaximumReturn | -0.000528 |
| MinimumReturn | -1.8      |
| TotalSamples  | 58310     |
-----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014081948902457952
Validation loss = 0.0011069794418290257
Validation loss = 0.0008972019422799349
Validation loss = 0.0017966678133234382
Validation loss = 0.0015371113549917936
Validation loss = 0.003101636189967394
Validation loss = 0.0011371157597750425
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009605750674381852
Validation loss = 0.0013466405216604471
Validation loss = 0.0011849370785057545
Validation loss = 0.0011478307424113154
Validation loss = 0.0018402935238555074
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018966530915349722
Validation loss = 0.0011940968688577414
Validation loss = 0.0007793754921294749
Validation loss = 0.0014059162931516767
Validation loss = 0.0009650798747316003
Validation loss = 0.0011455247877165675
Validation loss = 0.0008602680754847825
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021969443187117577
Validation loss = 0.001099416404031217
Validation loss = 0.0016963293310254812
Validation loss = 0.001186614972539246
Validation loss = 0.0010382543550804257
Validation loss = 0.0013147619320079684
Validation loss = 0.0015964024933055043
Validation loss = 0.0021127022337168455
Validation loss = 0.0011821757070720196
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020394260063767433
Validation loss = 0.0021191348787397146
Validation loss = 0.0009844237938523293
Validation loss = 0.001171786803752184
Validation loss = 0.0021474058739840984
Validation loss = 0.0011470867320895195
Validation loss = 0.001118045998737216
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.298   |
| Iteration     | 34       |
| MaximumReturn | -0.00052 |
| MinimumReturn | -6.69    |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001080413581803441
Validation loss = 0.000836120278108865
Validation loss = 0.0013445074437186122
Validation loss = 0.001009630155749619
Validation loss = 0.0008364247623831034
Validation loss = 0.00103229028172791
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003039881819859147
Validation loss = 0.0013124585384503007
Validation loss = 0.0009772389894351363
Validation loss = 0.002303424756973982
Validation loss = 0.0007852041744627059
Validation loss = 0.0031264980789273977
Validation loss = 0.0014726590598002076
Validation loss = 0.0011158937122672796
Validation loss = 0.0016576143680140376
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024278713390231133
Validation loss = 0.0012248092098161578
Validation loss = 0.0016496939351782203
Validation loss = 0.0009447439224459231
Validation loss = 0.0018968662479892373
Validation loss = 0.0011026901192963123
Validation loss = 0.0012596601154655218
Validation loss = 0.0008455235511064529
Validation loss = 0.0010253599612042308
Validation loss = 0.001461913576349616
Validation loss = 0.0010451182024553418
Validation loss = 0.0023974592331796885
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014886221615597606
Validation loss = 0.001015094923786819
Validation loss = 0.0021662600338459015
Validation loss = 0.0018499826546758413
Validation loss = 0.0013008490204811096
Validation loss = 0.004182558972388506
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013924315571784973
Validation loss = 0.0009446348994970322
Validation loss = 0.0007148446165956557
Validation loss = 0.000975593866314739
Validation loss = 0.0021461329888552427
Validation loss = 0.001086151460185647
Validation loss = 0.0010807119542732835
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -13.6     |
| Iteration     | 35        |
| MaximumReturn | -0.000603 |
| MinimumReturn | -100      |
| TotalSamples  | 61642     |
-----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001666150870732963
Validation loss = 0.0013887208187952638
Validation loss = 0.0011962737189605832
Validation loss = 0.0006854292587377131
Validation loss = 0.0008718598983250558
Validation loss = 0.0015024353051558137
Validation loss = 0.0007739599677734077
Validation loss = 0.0013235460501164198
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00144093856215477
Validation loss = 0.0007131168968044221
Validation loss = 0.0014231891836971045
Validation loss = 0.0015713215107098222
Validation loss = 0.0016347450437024236
Validation loss = 0.001183674088679254
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001832230482250452
Validation loss = 0.00092019495787099
Validation loss = 0.0013753242092207074
Validation loss = 0.0008895445498637855
Validation loss = 0.0007095205364748836
Validation loss = 0.0020318750757724047
Validation loss = 0.0015822071582078934
Validation loss = 0.0010828094091266394
Validation loss = 0.001987568801268935
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015316939679905772
Validation loss = 0.0010608671000227332
Validation loss = 0.003360775765031576
Validation loss = 0.0012353787897154689
Validation loss = 0.0014510805485770106
Validation loss = 0.0008268101955763996
Validation loss = 0.0008312813588418067
Validation loss = 0.0008079411345534027
Validation loss = 0.0009724851115606725
Validation loss = 0.0010190862230956554
Validation loss = 0.0014524509897455573
Validation loss = 0.0008290946716442704
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000790415215305984
Validation loss = 0.0012173785362392664
Validation loss = 0.0009236373589374125
Validation loss = 0.002146092476323247
Validation loss = 0.0012316444190219045
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -10.7     |
| Iteration     | 36        |
| MaximumReturn | -0.000532 |
| MinimumReturn | -101      |
| TotalSamples  | 63308     |
-----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001942776725627482
Validation loss = 0.002736950060352683
Validation loss = 0.0007161800749599934
Validation loss = 0.0005960556445643306
Validation loss = 0.0006934163975529373
Validation loss = 0.0009601143538020551
Validation loss = 0.0015902826562523842
Validation loss = 0.0015576647128909826
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001779463142156601
Validation loss = 0.0029397131875157356
Validation loss = 0.003842501901090145
Validation loss = 0.001134829013608396
Validation loss = 0.0009263743995688856
Validation loss = 0.00288196699693799
Validation loss = 0.001081140129826963
Validation loss = 0.0013486432144418359
Validation loss = 0.0017794452141970396
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018006268655881286
Validation loss = 0.0010362697066739202
Validation loss = 0.001439898507669568
Validation loss = 0.0022726445458829403
Validation loss = 0.0012073350371792912
Validation loss = 0.0010768085485324264
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009697023779153824
Validation loss = 0.001740218373015523
Validation loss = 0.0008850883459672332
Validation loss = 0.000889184360858053
Validation loss = 0.0011198766296729445
Validation loss = 0.0014318703906610608
Validation loss = 0.001101831323467195
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008278512395918369
Validation loss = 0.0013361208839341998
Validation loss = 0.0008678837330080569
Validation loss = 0.0012488275533542037
Validation loss = 0.0020602140575647354
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.65     |
| Iteration     | 37        |
| MaximumReturn | -0.000901 |
| MinimumReturn | -24.1     |
| TotalSamples  | 64974     |
-----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0024765953421592712
Validation loss = 0.0011117971735075116
Validation loss = 0.0009800310945138335
Validation loss = 0.001006787526421249
Validation loss = 0.0008371688309125602
Validation loss = 0.0012097572907805443
Validation loss = 0.0015078941360116005
Validation loss = 0.0007733132806606591
Validation loss = 0.0008925625588744879
Validation loss = 0.001100652851164341
Validation loss = 0.0008571617654524744
Validation loss = 0.0012310065794736147
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0048966920003294945
Validation loss = 0.0011824313551187515
Validation loss = 0.0013185952557250857
Validation loss = 0.0011371462605893612
Validation loss = 0.0007930119172669947
Validation loss = 0.0018302890239283442
Validation loss = 0.0013549656141549349
Validation loss = 0.0012054729741066694
Validation loss = 0.0013995979679748416
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0025115124881267548
Validation loss = 0.001796977361664176
Validation loss = 0.0011355996830388904
Validation loss = 0.001518317498266697
Validation loss = 0.001205959590151906
Validation loss = 0.001063915784470737
Validation loss = 0.0011708857491612434
Validation loss = 0.001087079057469964
Validation loss = 0.0024652271531522274
Validation loss = 0.001624880125746131
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0031895299907773733
Validation loss = 0.0012467594351619482
Validation loss = 0.0029421919025480747
Validation loss = 0.0009171073324978352
Validation loss = 0.0008935723453760147
Validation loss = 0.0024951936211436987
Validation loss = 0.001406169729307294
Validation loss = 0.0009305191924795508
Validation loss = 0.0012092485558241606
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019054192816838622
Validation loss = 0.0016282997094094753
Validation loss = 0.0023926477879285812
Validation loss = 0.0024001826532185078
Validation loss = 0.0014269324019551277
Validation loss = 0.0007543503306806087
Validation loss = 0.0010885068913921714
Validation loss = 0.0008550413767807186
Validation loss = 0.0016523315571248531
Validation loss = 0.00119829794857651
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0984  |
| Iteration     | 38       |
| MaximumReturn | -0.0723  |
| MinimumReturn | -0.133   |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010068893898278475
Validation loss = 0.0015061999438330531
Validation loss = 0.0010833012638613582
Validation loss = 0.0007486924296244979
Validation loss = 0.0010283534647896886
Validation loss = 0.0011642168974503875
Validation loss = 0.001044513308443129
Validation loss = 0.0014277397422119975
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012223732192069292
Validation loss = 0.0011784302769228816
Validation loss = 0.0013585883425548673
Validation loss = 0.0012147276429459453
Validation loss = 0.00096074806060642
Validation loss = 0.0011990679195150733
Validation loss = 0.0009267108980566263
Validation loss = 0.0013841483741998672
Validation loss = 0.0010639517568051815
Validation loss = 0.0009043675381690264
Validation loss = 0.001288571860641241
Validation loss = 0.0011136924149468541
Validation loss = 0.0011643231846392155
Validation loss = 0.002500220201909542
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017214424442499876
Validation loss = 0.0012704646214842796
Validation loss = 0.0011917384108528495
Validation loss = 0.0013066561659798026
Validation loss = 0.0017518681706860662
Validation loss = 0.002745244884863496
Validation loss = 0.0011713341809809208
Validation loss = 0.0013621437828987837
Validation loss = 0.0010133207542821765
Validation loss = 0.0010791791137307882
Validation loss = 0.0012448610505089164
Validation loss = 0.0020722676999866962
Validation loss = 0.0010128329740837216
Validation loss = 0.000983028206974268
Validation loss = 0.0010885099181905389
Validation loss = 0.0014061609981581569
Validation loss = 0.001129308482632041
Validation loss = 0.0019511263817548752
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001132008503191173
Validation loss = 0.0009807030437514186
Validation loss = 0.001022106152959168
Validation loss = 0.0024024108424782753
Validation loss = 0.002466502133756876
Validation loss = 0.0008625644841231406
Validation loss = 0.001283956109546125
Validation loss = 0.0009671665029600263
Validation loss = 0.001374556915834546
Validation loss = 0.0008983057923614979
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015875357203185558
Validation loss = 0.0013988110003992915
Validation loss = 0.002363677369430661
Validation loss = 0.0007868570974096656
Validation loss = 0.0011403035605326295
Validation loss = 0.0017427705461159348
Validation loss = 0.0011508509051054716
Validation loss = 0.001659652916714549
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -5.48     |
| Iteration     | 39        |
| MaximumReturn | -0.000715 |
| MinimumReturn | -69.2     |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008557560504414141
Validation loss = 0.0010322123998776078
Validation loss = 0.0010273379739373922
Validation loss = 0.0009410200291313231
Validation loss = 0.0013080317294225097
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017637283308431506
Validation loss = 0.001140042906627059
Validation loss = 0.0008210857049562037
Validation loss = 0.0007760876324027777
Validation loss = 0.0021735597401857376
Validation loss = 0.0017153528751805425
Validation loss = 0.001617309171706438
Validation loss = 0.0017872453900054097
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019723083823919296
Validation loss = 0.001352124847471714
Validation loss = 0.0025933862198144197
Validation loss = 0.0015653707087039948
Validation loss = 0.001273391768336296
Validation loss = 0.0010391132673248649
Validation loss = 0.0009698980720713735
Validation loss = 0.0010419414611533284
Validation loss = 0.00090835738228634
Validation loss = 0.0012032436206936836
Validation loss = 0.0011833002790808678
Validation loss = 0.00144713104236871
Validation loss = 0.0009567062370479107
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012666803086176515
Validation loss = 0.0010949193965643644
Validation loss = 0.0008351292926818132
Validation loss = 0.0015084994956851006
Validation loss = 0.0013125092955306172
Validation loss = 0.0010004104115068913
Validation loss = 0.0017597784753888845
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018081247108057141
Validation loss = 0.001195737742818892
Validation loss = 0.0012279258808121085
Validation loss = 0.001616050023585558
Validation loss = 0.000936450669541955
Validation loss = 0.0009797203820198774
Validation loss = 0.0017000014195218682
Validation loss = 0.0019821282476186752
Validation loss = 0.0015333066694438457
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0569  |
| Iteration     | 40       |
| MaximumReturn | -0.0344  |
| MinimumReturn | -0.0984  |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006522806943394244
Validation loss = 0.0005674809217453003
Validation loss = 0.0012648204574361444
Validation loss = 0.0012964567868039012
Validation loss = 0.0009449644712731242
Validation loss = 0.0009897617856040597
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008745411760173738
Validation loss = 0.001152481883764267
Validation loss = 0.0009804002474993467
Validation loss = 0.0020519145764410496
Validation loss = 0.0023112741764634848
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010758033022284508
Validation loss = 0.0012101317988708615
Validation loss = 0.0008299348410218954
Validation loss = 0.0008431316819041967
Validation loss = 0.0014055436477065086
Validation loss = 0.0008185362676158547
Validation loss = 0.0011066056322306395
Validation loss = 0.001774134929291904
Validation loss = 0.001727829221636057
Validation loss = 0.0009058965952135623
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008591079385951161
Validation loss = 0.0008956450619734824
Validation loss = 0.0006206980906426907
Validation loss = 0.001233961433172226
Validation loss = 0.0007355818524956703
Validation loss = 0.001923895557411015
Validation loss = 0.0007270741043612361
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001035078545100987
Validation loss = 0.002200407674536109
Validation loss = 0.0011046574218198657
Validation loss = 0.0011393269523978233
Validation loss = 0.0009677637717686594
Validation loss = 0.0010378607548773289
Validation loss = 0.0016117282211780548
Validation loss = 0.0008150590001605451
Validation loss = 0.0007913039298728108
Validation loss = 0.0014465637505054474
Validation loss = 0.0018146642250940204
Validation loss = 0.0007604912389069796
Validation loss = 0.0010067649418488145
Validation loss = 0.0015041610458865762
Validation loss = 0.0010227509774267673
Validation loss = 0.001228079665452242
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0118   |
| Iteration     | 41        |
| MaximumReturn | -0.000872 |
| MinimumReturn | -0.0285   |
| TotalSamples  | 71638     |
-----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009792027994990349
Validation loss = 0.0010566433193162084
Validation loss = 0.0009018330601975322
Validation loss = 0.0006682932144030929
Validation loss = 0.0009895419934764504
Validation loss = 0.001060105161741376
Validation loss = 0.001126609742641449
Validation loss = 0.0011423869291320443
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000984427984803915
Validation loss = 0.0010525976540520787
Validation loss = 0.0009489186340942979
Validation loss = 0.0012452895753085613
Validation loss = 0.0010489741107448936
Validation loss = 0.0014595526736229658
Validation loss = 0.0011381898075342178
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014357048785313964
Validation loss = 0.0010535401524975896
Validation loss = 0.0010125084081664681
Validation loss = 0.000987646053545177
Validation loss = 0.0010033081052824855
Validation loss = 0.001010411069728434
Validation loss = 0.0016363572794944048
Validation loss = 0.0014231664827093482
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014050049940124154
Validation loss = 0.0011860169470310211
Validation loss = 0.0011184891918674111
Validation loss = 0.0006828094483353198
Validation loss = 0.0009691662271507084
Validation loss = 0.001398741384036839
Validation loss = 0.0007225294830277562
Validation loss = 0.0008374960161745548
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008748667314648628
Validation loss = 0.0008812117157503963
Validation loss = 0.0006828850600868464
Validation loss = 0.0013969483552500606
Validation loss = 0.0009008758934214711
Validation loss = 0.0016174794873222709
Validation loss = 0.0011709521058946848
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0904  |
| Iteration     | 42       |
| MaximumReturn | -0.0465  |
| MinimumReturn | -0.166   |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012281545205041766
Validation loss = 0.0008235390996560454
Validation loss = 0.0016135351033881307
Validation loss = 0.0018191782291978598
Validation loss = 0.0008442051475867629
Validation loss = 0.0011831335723400116
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007717374246567488
Validation loss = 0.0008374929311685264
Validation loss = 0.001354664913378656
Validation loss = 0.0017627233173698187
Validation loss = 0.0009379416005685925
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002056926256045699
Validation loss = 0.0009490189258940518
Validation loss = 0.0015677299816161394
Validation loss = 0.0021198205649852753
Validation loss = 0.0020785080268979073
Validation loss = 0.0008345232927240431
Validation loss = 0.0010526946280151606
Validation loss = 0.001596497604623437
Validation loss = 0.000926338427234441
Validation loss = 0.0013062249636277556
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011126655153930187
Validation loss = 0.0015066878404468298
Validation loss = 0.0010005696676671505
Validation loss = 0.001355720218271017
Validation loss = 0.0013036157470196486
Validation loss = 0.0007924401434138417
Validation loss = 0.0010807402431964874
Validation loss = 0.0010142846731469035
Validation loss = 0.0011222916655242443
Validation loss = 0.000704423466231674
Validation loss = 0.0009092341642826796
Validation loss = 0.0009136372827924788
Validation loss = 0.001684849034063518
Validation loss = 0.0009217067854478955
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014707627706229687
Validation loss = 0.0007919995114207268
Validation loss = 0.0014984002336859703
Validation loss = 0.0007835638243705034
Validation loss = 0.0010037519969046116
Validation loss = 0.0015308276051655412
Validation loss = 0.0008684087079018354
Validation loss = 0.0013419021852314472
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.3     |
| Iteration     | 43       |
| MaximumReturn | -0.00079 |
| MinimumReturn | -88.2    |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010717706754803658
Validation loss = 0.0016801167512312531
Validation loss = 0.0008777382900007069
Validation loss = 0.0009768042946234345
Validation loss = 0.001355667831376195
Validation loss = 0.000629366491921246
Validation loss = 0.000697312003467232
Validation loss = 0.0009849724592640996
Validation loss = 0.0008964197477325797
Validation loss = 0.0006932345568202436
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007555885822512209
Validation loss = 0.001589009421877563
Validation loss = 0.0012911417288705707
Validation loss = 0.0009199484484270215
Validation loss = 0.0011244164779782295
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015395628288388252
Validation loss = 0.001267321058548987
Validation loss = 0.0008651496609672904
Validation loss = 0.0008994414238259196
Validation loss = 0.0011104256846010685
Validation loss = 0.0008900905377231538
Validation loss = 0.0007915638852864504
Validation loss = 0.0010317467385903
Validation loss = 0.001076477230526507
Validation loss = 0.0008451219764538109
Validation loss = 0.0008100473205558956
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000850544311106205
Validation loss = 0.001109009375795722
Validation loss = 0.0007883504149504006
Validation loss = 0.0015403730794787407
Validation loss = 0.0021383631974458694
Validation loss = 0.0012440987629815936
Validation loss = 0.0006782013806514442
Validation loss = 0.0012295818887650967
Validation loss = 0.0010740383295342326
Validation loss = 0.0020551332272589207
Validation loss = 0.0018480519065633416
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016086038667708635
Validation loss = 0.0017252955585718155
Validation loss = 0.0012793515343219042
Validation loss = 0.0014421290252357721
Validation loss = 0.001006070408038795
Validation loss = 0.0012235372560098767
Validation loss = 0.0008203465258702636
Validation loss = 0.0019152997992932796
Validation loss = 0.0009984285570681095
Validation loss = 0.0009739545057527721
Validation loss = 0.0008052751654759049
Validation loss = 0.00075287907384336
Validation loss = 0.0011179741704836488
Validation loss = 0.0006851617945358157
Validation loss = 0.0008421277743764222
Validation loss = 0.0010613566264510155
Validation loss = 0.0005765092791989446
Validation loss = 0.0014251372776925564
Validation loss = 0.0008782593649812043
Validation loss = 0.001116747735068202
Validation loss = 0.0016126636182889342
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.96    |
| Iteration     | 44       |
| MaximumReturn | -0.00104 |
| MinimumReturn | -55.3    |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010043707443401217
Validation loss = 0.0015092225512489676
Validation loss = 0.002214405918493867
Validation loss = 0.000842281267978251
Validation loss = 0.000971405825112015
Validation loss = 0.0008845714619383216
Validation loss = 0.0007879739277996123
Validation loss = 0.0007774960249662399
Validation loss = 0.0005592223023995757
Validation loss = 0.0014188530622050166
Validation loss = 0.0013250332558527589
Validation loss = 0.001067158067598939
Validation loss = 0.0010749632492661476
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007857803138904274
Validation loss = 0.0007673976942896843
Validation loss = 0.0016306238248944283
Validation loss = 0.0009412503568455577
Validation loss = 0.0014980784617364407
Validation loss = 0.0022367413621395826
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006581707275472581
Validation loss = 0.0008857770008035004
Validation loss = 0.0010202968260273337
Validation loss = 0.0012648323317989707
Validation loss = 0.0012951097451150417
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001336040673777461
Validation loss = 0.0008841880480758846
Validation loss = 0.0007891904679127038
Validation loss = 0.0007988819270394742
Validation loss = 0.0006724187987856567
Validation loss = 0.0009785041911527514
Validation loss = 0.0010044235968962312
Validation loss = 0.0010078897466883063
Validation loss = 0.0009660734795033932
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007199345272965729
Validation loss = 0.0009601546335034072
Validation loss = 0.0010323947062715888
Validation loss = 0.001733394805341959
Validation loss = 0.0011789913987740874
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.255   |
| Iteration     | 45       |
| MaximumReturn | -0.137   |
| MinimumReturn | -0.716   |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000894886557944119
Validation loss = 0.0008945357403717935
Validation loss = 0.0011751721613109112
Validation loss = 0.0011008490109816194
Validation loss = 0.0006474832771345973
Validation loss = 0.0007581661338917911
Validation loss = 0.0007641342817805707
Validation loss = 0.0007202820270322263
Validation loss = 0.0007993584149517119
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006725401035510004
Validation loss = 0.0006736181094311178
Validation loss = 0.0008992208167910576
Validation loss = 0.001471567084081471
Validation loss = 0.0009736065985634923
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009268312132917345
Validation loss = 0.0008723806240595877
Validation loss = 0.0008411273011006415
Validation loss = 0.0009108687518164515
Validation loss = 0.0009922866011038423
Validation loss = 0.0008226950885728002
Validation loss = 0.001746814465150237
Validation loss = 0.000753177737351507
Validation loss = 0.0011900196550413966
Validation loss = 0.0006532251136377454
Validation loss = 0.0007514618919230998
Validation loss = 0.0012614900479093194
Validation loss = 0.0006139264442026615
Validation loss = 0.001926841796375811
Validation loss = 0.0011814736062660813
Validation loss = 0.000858979532495141
Validation loss = 0.0007102538947947323
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000661231460981071
Validation loss = 0.00199871975928545
Validation loss = 0.0012483213795349002
Validation loss = 0.0008695116266608238
Validation loss = 0.0018723258981481194
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008672489202581346
Validation loss = 0.0006015895633026958
Validation loss = 0.002051775110885501
Validation loss = 0.000859441643115133
Validation loss = 0.0007104374235495925
Validation loss = 0.0008656781865283847
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0735  |
| Iteration     | 46       |
| MaximumReturn | -0.00317 |
| MinimumReturn | -0.169   |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009934089612215757
Validation loss = 0.0009245990077033639
Validation loss = 0.0008295440929941833
Validation loss = 0.0008529188926331699
Validation loss = 0.0012142462655901909
Validation loss = 0.0006296075880527496
Validation loss = 0.000783291645348072
Validation loss = 0.0006183633813634515
Validation loss = 0.0010698087280616164
Validation loss = 0.0005540857673622668
Validation loss = 0.0009067727369256318
Validation loss = 0.0006902868626639247
Validation loss = 0.0004955147160217166
Validation loss = 0.0011617401614785194
Validation loss = 0.000526654243003577
Validation loss = 0.0015211512800306082
Validation loss = 0.0010943885426968336
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013390863314270973
Validation loss = 0.0013564161490648985
Validation loss = 0.0008871603640727699
Validation loss = 0.0008246920770034194
Validation loss = 0.0008406533161178231
Validation loss = 0.0009797692764550447
Validation loss = 0.0011873629409819841
Validation loss = 0.0007147470023483038
Validation loss = 0.0008992757648229599
Validation loss = 0.0006047515198588371
Validation loss = 0.0008479267125949264
Validation loss = 0.000983308651484549
Validation loss = 0.0012054858962073922
Validation loss = 0.0006573173450306058
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008371599251404405
Validation loss = 0.0005977816181257367
Validation loss = 0.0006335967918857932
Validation loss = 0.0009442122536711395
Validation loss = 0.0006468583596870303
Validation loss = 0.0013856821460649371
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005808493006043136
Validation loss = 0.0009886641055345535
Validation loss = 0.0006285941926762462
Validation loss = 0.0005180117441341281
Validation loss = 0.0011856209021061659
Validation loss = 0.002262893132865429
Validation loss = 0.0010211870539933443
Validation loss = 0.0013014895375818014
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007440276676788926
Validation loss = 0.0010391963878646493
Validation loss = 0.0010664224391803145
Validation loss = 0.0007531316368840635
Validation loss = 0.0014308441895991564
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00317  |
| Iteration     | 47        |
| MaximumReturn | -0.000588 |
| MinimumReturn | -0.0173   |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005842967657372355
Validation loss = 0.0007952324813231826
Validation loss = 0.0006137368036434054
Validation loss = 0.0019026367226615548
Validation loss = 0.0009494366240687668
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014939946122467518
Validation loss = 0.0009727159631438553
Validation loss = 0.0009326689760200679
Validation loss = 0.0011637603165581822
Validation loss = 0.0010753192473202944
Validation loss = 0.0006774759967811406
Validation loss = 0.0014894618652760983
Validation loss = 0.0015656970208510756
Validation loss = 0.0010708689223974943
Validation loss = 0.0008278869790956378
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007518866332247853
Validation loss = 0.0013702225405722857
Validation loss = 0.0011839233338832855
Validation loss = 0.0013260345440357924
Validation loss = 0.0006983620696701109
Validation loss = 0.0010889163240790367
Validation loss = 0.0006508688675239682
Validation loss = 0.0014688590308651328
Validation loss = 0.0010437010787427425
Validation loss = 0.0007581970421597362
Validation loss = 0.0006941749015823007
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00094837776850909
Validation loss = 0.0009252711315639317
Validation loss = 0.0008852469036355615
Validation loss = 0.0009185686940327287
Validation loss = 0.0011663464829325676
Validation loss = 0.0008097917889244854
Validation loss = 0.001205784035846591
Validation loss = 0.0009760550456121564
Validation loss = 0.0009229526622220874
Validation loss = 0.0006424171733669937
Validation loss = 0.0007621173863299191
Validation loss = 0.0021893195807933807
Validation loss = 0.0017511399928480387
Validation loss = 0.0005230306414887309
Validation loss = 0.0009970979299396276
Validation loss = 0.0010849777609109879
Validation loss = 0.0005899375537410378
Validation loss = 0.0006768226739950478
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009493332472629845
Validation loss = 0.0010625188006088138
Validation loss = 0.00047153793275356293
Validation loss = 0.0010795033304020762
Validation loss = 0.0008491690387018025
Validation loss = 0.0012624997179955244
Validation loss = 0.0006707842694595456
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0104   |
| Iteration     | 48        |
| MaximumReturn | -0.000711 |
| MinimumReturn | -0.0573   |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008413726463913918
Validation loss = 0.000806477211881429
Validation loss = 0.0012954379199072719
Validation loss = 0.0006969476235099137
Validation loss = 0.0009693597676232457
Validation loss = 0.0008857059292495251
Validation loss = 0.0006599363405257463
Validation loss = 0.0007536948542110622
Validation loss = 0.0005869686137884855
Validation loss = 0.0007624473655596375
Validation loss = 0.0008649762603454292
Validation loss = 0.0005874601774848998
Validation loss = 0.0010203152196481824
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012655252357944846
Validation loss = 0.0010039953049272299
Validation loss = 0.001176239806227386
Validation loss = 0.0012104008346796036
Validation loss = 0.0006868164637126029
Validation loss = 0.0006967063527554274
Validation loss = 0.0011462116381153464
Validation loss = 0.0006948790396563709
Validation loss = 0.0011144708842039108
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017653156537562609
Validation loss = 0.0015425141900777817
Validation loss = 0.0009903311729431152
Validation loss = 0.0008948756149038672
Validation loss = 0.0008055483922362328
Validation loss = 0.0012382575077936053
Validation loss = 0.0008295290172100067
Validation loss = 0.0007756741251796484
Validation loss = 0.001297970418818295
Validation loss = 0.0007620835094712675
Validation loss = 0.0011252431431785226
Validation loss = 0.0016428737435489893
Validation loss = 0.0010752823436632752
Validation loss = 0.0009066486381925642
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00098307931330055
Validation loss = 0.0011913850903511047
Validation loss = 0.0010968262795358896
Validation loss = 0.0007285833125934005
Validation loss = 0.0014805240789428353
Validation loss = 0.0005932238418608904
Validation loss = 0.0007378770387731493
Validation loss = 0.0007100266520865262
Validation loss = 0.0007098448695614934
Validation loss = 0.0013188943266868591
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009256146149709821
Validation loss = 0.0009311358444392681
Validation loss = 0.0006531807011924684
Validation loss = 0.0012069657677784562
Validation loss = 0.0009517809376120567
Validation loss = 0.0006770032923668623
Validation loss = 0.000950158981140703
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0936  |
| Iteration     | 49       |
| MaximumReturn | -0.0493  |
| MinimumReturn | -0.141   |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008265054784715176
Validation loss = 0.0005059678805992007
Validation loss = 0.0009243220556527376
Validation loss = 0.0012786956503987312
Validation loss = 0.000623017898760736
Validation loss = 0.0011104678269475698
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007191314944066107
Validation loss = 0.0006451354711316526
Validation loss = 0.0008979973499663174
Validation loss = 0.0010471242712810636
Validation loss = 0.0007321702432818711
Validation loss = 0.0007958031492307782
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000893285614438355
Validation loss = 0.0007002719794400036
Validation loss = 0.0005607536877505481
Validation loss = 0.0012852645013481379
Validation loss = 0.0008557628025300801
Validation loss = 0.0008786513935774565
Validation loss = 0.0013232897035777569
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008216725545935333
Validation loss = 0.0007413296843878925
Validation loss = 0.0005415836931206286
Validation loss = 0.0007544058607891202
Validation loss = 0.000870285730343312
Validation loss = 0.0007074422319419682
Validation loss = 0.0008483008714392781
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006889230571687222
Validation loss = 0.0005068104364909232
Validation loss = 0.0012229380663484335
Validation loss = 0.0006337274680845439
Validation loss = 0.0007077333284541965
Validation loss = 0.0013172243488952518
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0739  |
| Iteration     | 50       |
| MaximumReturn | -0.0278  |
| MinimumReturn | -0.109   |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007579197408631444
Validation loss = 0.0007244008011184633
Validation loss = 0.0007725856848992407
Validation loss = 0.0005105207092128694
Validation loss = 0.0005601660232059658
Validation loss = 0.0007393528940156102
Validation loss = 0.0007423022761940956
Validation loss = 0.0006752448971383274
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008059342508204281
Validation loss = 0.0010199358221143484
Validation loss = 0.000819063454400748
Validation loss = 0.001099348533898592
Validation loss = 0.0007379862363450229
Validation loss = 0.0007520275539718568
Validation loss = 0.0007839786121621728
Validation loss = 0.000693621055688709
Validation loss = 0.0009875912219285965
Validation loss = 0.0005880423705093563
Validation loss = 0.0005359005299396813
Validation loss = 0.000638374884147197
Validation loss = 0.0012333965860307217
Validation loss = 0.0007145506679080427
Validation loss = 0.0007012514397501945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008258142042905092
Validation loss = 0.0008630507509224117
Validation loss = 0.0006595880258828402
Validation loss = 0.0006250754813663661
Validation loss = 0.0010154737392440438
Validation loss = 0.0017930504400283098
Validation loss = 0.0010229087201878428
Validation loss = 0.0007137623033486307
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007800867315381765
Validation loss = 0.0013867068337276578
Validation loss = 0.0005169560899958014
Validation loss = 0.0011978672118857503
Validation loss = 0.0011617292184382677
Validation loss = 0.0008626356720924377
Validation loss = 0.000868011440616101
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013434751890599728
Validation loss = 0.0008195601985789835
Validation loss = 0.00092046067584306
Validation loss = 0.0007952271262183785
Validation loss = 0.0005141388974152505
Validation loss = 0.0013115855399519205
Validation loss = 0.0007572411559522152
Validation loss = 0.0004598902596626431
Validation loss = 0.0005641345051117241
Validation loss = 0.0005089854821562767
Validation loss = 0.0009069873485714197
Validation loss = 0.0009675920591689646
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.168   |
| Iteration     | 51       |
| MaximumReturn | -0.00721 |
| MinimumReturn | -2.71    |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007644485449418426
Validation loss = 0.0008182330639101565
Validation loss = 0.0007002389174886048
Validation loss = 0.0008913642377592623
Validation loss = 0.000546204624697566
Validation loss = 0.0007930630235932767
Validation loss = 0.0005769875133410096
Validation loss = 0.0016430880641564727
Validation loss = 0.0006319526000879705
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011801234213635325
Validation loss = 0.0008321000495925546
Validation loss = 0.0011233622208237648
Validation loss = 0.0014136842219159007
Validation loss = 0.0010249103652313352
Validation loss = 0.0005395635380409658
Validation loss = 0.0010606823489069939
Validation loss = 0.0005160218570381403
Validation loss = 0.0006648742128163576
Validation loss = 0.0009197829640470445
Validation loss = 0.0008651459938846529
Validation loss = 0.0006942853797227144
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009766222210600972
Validation loss = 0.0009626240935176611
Validation loss = 0.0005393010214902461
Validation loss = 0.0008721938356757164
Validation loss = 0.0006144499056972563
Validation loss = 0.0007247254252433777
Validation loss = 0.0010827240766957402
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009073697146959603
Validation loss = 0.0018851682543754578
Validation loss = 0.0004948715213686228
Validation loss = 0.0012764469720423222
Validation loss = 0.0005527847679331899
Validation loss = 0.0007115946500562131
Validation loss = 0.0007060833158902824
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012410451890900731
Validation loss = 0.000757903209887445
Validation loss = 0.0007348841172643006
Validation loss = 0.0008437572396360338
Validation loss = 0.0006603577639907598
Validation loss = 0.0010763133177533746
Validation loss = 0.0008716032025404274
Validation loss = 0.0005537836696021259
Validation loss = 0.00046801642747595906
Validation loss = 0.0009859107667580247
Validation loss = 0.0006818523979745805
Validation loss = 0.0006915429257787764
Validation loss = 0.0006880783475935459
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00364  |
| Iteration     | 52        |
| MaximumReturn | -0.000576 |
| MinimumReturn | -0.023    |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007905463571660221
Validation loss = 0.0008255841094069183
Validation loss = 0.0007741386070847511
Validation loss = 0.0006550976540893316
Validation loss = 0.0008170111104846001
Validation loss = 0.0008182649617083371
Validation loss = 0.0006810629274696112
Validation loss = 0.0006999400211498141
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006382691790349782
Validation loss = 0.0016294510569423437
Validation loss = 0.001086732721887529
Validation loss = 0.0007243074360303581
Validation loss = 0.0007942217052914202
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007086544646881521
Validation loss = 0.00044613576028496027
Validation loss = 0.0007965115364640951
Validation loss = 0.0006332293851301074
Validation loss = 0.0005300806369632483
Validation loss = 0.001102471724152565
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005322071374394
Validation loss = 0.0006675970507785678
Validation loss = 0.0010754306567832828
Validation loss = 0.001091729267500341
Validation loss = 0.00043789995834231377
Validation loss = 0.0008242169860750437
Validation loss = 0.0009672489250078797
Validation loss = 0.0006564114592038095
Validation loss = 0.0005776589387096465
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006175892776809633
Validation loss = 0.0007256185635924339
Validation loss = 0.0013039963087067008
Validation loss = 0.0012010852806270123
Validation loss = 0.0007843338535167277
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0485  |
| Iteration     | 53       |
| MaximumReturn | -0.011   |
| MinimumReturn | -0.0823  |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006208721897564828
Validation loss = 0.0009840893326327205
Validation loss = 0.0005714151775464416
Validation loss = 0.0005485821166075766
Validation loss = 0.0008428756846114993
Validation loss = 0.0007234521326608956
Validation loss = 0.000806816853582859
Validation loss = 0.0011493588099256158
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002154895104467869
Validation loss = 0.0007698943954892457
Validation loss = 0.0005127928452566266
Validation loss = 0.0010857946472242475
Validation loss = 0.0010665825102478266
Validation loss = 0.0005320136551745236
Validation loss = 0.0011200078297406435
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007528342539444566
Validation loss = 0.0008340124622918665
Validation loss = 0.0013206907315179706
Validation loss = 0.0006730659515596926
Validation loss = 0.000892586656846106
Validation loss = 0.0007902325596660376
Validation loss = 0.0007393258856609464
Validation loss = 0.000582162057980895
Validation loss = 0.0006330234464257956
Validation loss = 0.0007863262435421348
Validation loss = 0.0005639860173687339
Validation loss = 0.0005986249889247119
Validation loss = 0.0004181083641014993
Validation loss = 0.0010275831446051598
Validation loss = 0.0006425771280191839
Validation loss = 0.0012437656987458467
Validation loss = 0.0008363241795450449
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00045070104533806443
Validation loss = 0.000988669809885323
Validation loss = 0.000915404234547168
Validation loss = 0.0010186913423240185
Validation loss = 0.000558039580937475
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005382076487876475
Validation loss = 0.0014972604112699628
Validation loss = 0.0006901397719047964
Validation loss = 0.000635311589576304
Validation loss = 0.0007295095128938556
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0883  |
| Iteration     | 54       |
| MaximumReturn | -0.0436  |
| MinimumReturn | -0.143   |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006234237807802856
Validation loss = 0.0007829596288502216
Validation loss = 0.0005783735541626811
Validation loss = 0.0006093609845265746
Validation loss = 0.0015348863089457154
Validation loss = 0.0006149727269075811
Validation loss = 0.0017628438072279096
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008816575864329934
Validation loss = 0.0006261654780246317
Validation loss = 0.00046112024574540555
Validation loss = 0.0005528769106604159
Validation loss = 0.0008596982224844396
Validation loss = 0.0006742639816366136
Validation loss = 0.0004449346160981804
Validation loss = 0.0005549680790863931
Validation loss = 0.0006741575198248029
Validation loss = 0.0009185556555166841
Validation loss = 0.0007234719232656062
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007232886855490506
Validation loss = 0.0014279980678111315
Validation loss = 0.0007948249694891274
Validation loss = 0.0005146121839061379
Validation loss = 0.0009198025218211114
Validation loss = 0.0005111931241117418
Validation loss = 0.001025651814416051
Validation loss = 0.0006804416771046817
Validation loss = 0.000590464798733592
Validation loss = 0.0006347465678118169
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007850958500057459
Validation loss = 0.0008569196797907352
Validation loss = 0.0007719016284681857
Validation loss = 0.0005697721499018371
Validation loss = 0.0005384829710237682
Validation loss = 0.0006686703418381512
Validation loss = 0.00048288184916600585
Validation loss = 0.001074363710358739
Validation loss = 0.0014116890961304307
Validation loss = 0.0014205819461494684
Validation loss = 0.0005267320084385574
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007358728908002377
Validation loss = 0.0007978595094755292
Validation loss = 0.001213214942254126
Validation loss = 0.0005141805741004646
Validation loss = 0.0006841718568466604
Validation loss = 0.00048381616943515837
Validation loss = 0.0008568161865696311
Validation loss = 0.0013660606928169727
Validation loss = 0.0004483922675717622
Validation loss = 0.0007524410029873252
Validation loss = 0.0006839150446467102
Validation loss = 0.0007464733207598329
Validation loss = 0.000819619745016098
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00214  |
| Iteration     | 55        |
| MaximumReturn | -0.000677 |
| MinimumReturn | -0.0171   |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000975315342657268
Validation loss = 0.0008207401260733604
Validation loss = 0.0005041202530264854
Validation loss = 0.0006303851259872317
Validation loss = 0.000800197129137814
Validation loss = 0.0022504604421555996
Validation loss = 0.0005531503120437264
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004855323932133615
Validation loss = 0.0015079588629305363
Validation loss = 0.0009980691829696298
Validation loss = 0.000599056831561029
Validation loss = 0.0006542302435263991
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006660587387159467
Validation loss = 0.0007200597901828587
Validation loss = 0.0006438995478674769
Validation loss = 0.0009379521361552179
Validation loss = 0.0009351826156489551
Validation loss = 0.0005688336095772684
Validation loss = 0.0005513651412911713
Validation loss = 0.0006717428914271295
Validation loss = 0.0008060612599365413
Validation loss = 0.0006411979557015002
Validation loss = 0.0007222126587294042
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006252892198972404
Validation loss = 0.001122429035604
Validation loss = 0.0006143126520328224
Validation loss = 0.0007167700096033514
Validation loss = 0.0006874374230392277
Validation loss = 0.0006566790980286896
Validation loss = 0.0006280022789724171
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006230588187463582
Validation loss = 0.0006359103135764599
Validation loss = 0.0006303575937636197
Validation loss = 0.0005239758174866438
Validation loss = 0.0007376901339739561
Validation loss = 0.0011454613413661718
Validation loss = 0.0007321488228626549
Validation loss = 0.0008246887009590864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.32    |
| Iteration     | 56       |
| MaximumReturn | -0.127   |
| MinimumReturn | -87.4    |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004642278654500842
Validation loss = 0.0005195341655053198
Validation loss = 0.00043051844113506377
Validation loss = 0.0004324100154917687
Validation loss = 0.0005910613690502942
Validation loss = 0.00043434082181192935
Validation loss = 0.000534598424565047
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006296488572843373
Validation loss = 0.00044099651859141886
Validation loss = 0.0012437041150406003
Validation loss = 0.0005460598622448742
Validation loss = 0.0006686255219392478
Validation loss = 0.000670602370519191
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004519135400187224
Validation loss = 0.0006693821051158011
Validation loss = 0.0005522802821360528
Validation loss = 0.0004495314497034997
Validation loss = 0.0007100039510987699
Validation loss = 0.0005828082212246954
Validation loss = 0.0005501843988895416
Validation loss = 0.0005817773635499179
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0004197007219772786
Validation loss = 0.0004929914139211178
Validation loss = 0.0006606311653740704
Validation loss = 0.0004917962360195816
Validation loss = 0.0005048965103924274
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005155958351679146
Validation loss = 0.00039056033710949123
Validation loss = 0.0004060681676492095
Validation loss = 0.000488008139654994
Validation loss = 0.000588162976782769
Validation loss = 0.0005838601500727236
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0329  |
| Iteration     | 57       |
| MaximumReturn | -0.0011  |
| MinimumReturn | -0.0825  |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007566636195406318
Validation loss = 0.001930144033394754
Validation loss = 0.0010144409025087953
Validation loss = 0.0012245038524270058
Validation loss = 0.0004883318324573338
Validation loss = 0.0011422435054555535
Validation loss = 0.0006063089240342379
Validation loss = 0.0006615570746362209
Validation loss = 0.0005427126307040453
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011263699270784855
Validation loss = 0.0005993663799017668
Validation loss = 0.0008238363661803305
Validation loss = 0.0008136981050483882
Validation loss = 0.0008517639944329858
Validation loss = 0.0005687919328920543
Validation loss = 0.0010046092793345451
Validation loss = 0.0006024856702424586
Validation loss = 0.0006884800386615098
Validation loss = 0.0005076768575236201
Validation loss = 0.0005754741141572595
Validation loss = 0.0006128479726612568
Validation loss = 0.0009294850169681013
Validation loss = 0.0009125624783337116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008817945490591228
Validation loss = 0.000672411173582077
Validation loss = 0.0006906729540787637
Validation loss = 0.0007048762636259198
Validation loss = 0.0006749892490915954
Validation loss = 0.000912035524379462
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007340630399994552
Validation loss = 0.0008385899127461016
Validation loss = 0.0006516500725410879
Validation loss = 0.0011301866034045815
Validation loss = 0.0008491494227200747
Validation loss = 0.0009706075070425868
Validation loss = 0.000965135230217129
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007296295952983201
Validation loss = 0.0006630818825215101
Validation loss = 0.0005561616853810847
Validation loss = 0.0010439767502248287
Validation loss = 0.000610502902418375
Validation loss = 0.0015407727332785726
Validation loss = 0.0005654461565427482
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.05     |
| Iteration     | 58        |
| MaximumReturn | -0.000654 |
| MinimumReturn | -25.8     |
| TotalSamples  | 99960     |
-----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015926341293379664
Validation loss = 0.0011963180731981993
Validation loss = 0.0015472653321921825
Validation loss = 0.0014949410688132048
Validation loss = 0.0010584970004856586
Validation loss = 0.001342092640697956
Validation loss = 0.0011163087328895926
Validation loss = 0.001210452290251851
Validation loss = 0.0013996645575389266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017487028380855918
Validation loss = 0.0015299362130463123
Validation loss = 0.001547337626107037
Validation loss = 0.0017429729923605919
Validation loss = 0.0017666253261268139
Validation loss = 0.0017180975992232561
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002062499523162842
Validation loss = 0.0014756354503333569
Validation loss = 0.0015492313541471958
Validation loss = 0.0016000199830159545
Validation loss = 0.003235591109842062
Validation loss = 0.001297404756769538
Validation loss = 0.001382587244734168
Validation loss = 0.0017249828670173883
Validation loss = 0.0013867783127352595
Validation loss = 0.001446414040401578
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001561747514642775
Validation loss = 0.0015501916641369462
Validation loss = 0.001833924325183034
Validation loss = 0.0014974522637203336
Validation loss = 0.001684004906564951
Validation loss = 0.00205095112323761
Validation loss = 0.0013289712369441986
Validation loss = 0.0014752474380657077
Validation loss = 0.001319859642535448
Validation loss = 0.001399180036969483
Validation loss = 0.0011316925520077348
Validation loss = 0.0013477446045726538
Validation loss = 0.0011116235982626677
Validation loss = 0.0013339516008272767
Validation loss = 0.0013754963874816895
Validation loss = 0.001288082916289568
Validation loss = 0.0020908256992697716
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018790471367537975
Validation loss = 0.0017946473089978099
Validation loss = 0.0015351210022345185
Validation loss = 0.0017521950649097562
Validation loss = 0.0014333167346194386
Validation loss = 0.0017746371449902654
Validation loss = 0.002352519892156124
Validation loss = 0.0018658878980204463
Validation loss = 0.0019834123086184263
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00447  |
| Iteration     | 59        |
| MaximumReturn | -0.000508 |
| MinimumReturn | -0.0335   |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013767423806712031
Validation loss = 0.001723162247799337
Validation loss = 0.0016313630621880293
Validation loss = 0.0012004099553450942
Validation loss = 0.0016413310077041388
Validation loss = 0.001193673931993544
Validation loss = 0.0018844427540898323
Validation loss = 0.0013472288846969604
Validation loss = 0.0010660614352673292
Validation loss = 0.002019063802435994
Validation loss = 0.0013599772937595844
Validation loss = 0.0012239136267453432
Validation loss = 0.0012047315249219537
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001366179552860558
Validation loss = 0.0015226393006742
Validation loss = 0.0018722426611930132
Validation loss = 0.0025928502436727285
Validation loss = 0.0016271519707515836
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003126017050817609
Validation loss = 0.0013995341723784804
Validation loss = 0.0013680881820619106
Validation loss = 0.0017174535896629095
Validation loss = 0.0013768263161182404
Validation loss = 0.001268599065952003
Validation loss = 0.0014908273005858064
Validation loss = 0.0017368086846545339
Validation loss = 0.0011550677008926868
Validation loss = 0.0012222297955304384
Validation loss = 0.002181327436119318
Validation loss = 0.0013490748824551702
Validation loss = 0.0014883182011544704
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001542376121506095
Validation loss = 0.0014856061898171902
Validation loss = 0.0013621142134070396
Validation loss = 0.0016027683159336448
Validation loss = 0.0013171210885047913
Validation loss = 0.001846516621299088
Validation loss = 0.001258253469131887
Validation loss = 0.0013417324516922235
Validation loss = 0.0013942670775577426
Validation loss = 0.0012434690725058317
Validation loss = 0.0014632603852078319
Validation loss = 0.001271712826564908
Validation loss = 0.0013465519296005368
Validation loss = 0.0014486451400443912
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018981508910655975
Validation loss = 0.0017020775703713298
Validation loss = 0.0015622456558048725
Validation loss = 0.0014673880068585277
Validation loss = 0.0015072593232616782
Validation loss = 0.001619527000002563
Validation loss = 0.001389659009873867
Validation loss = 0.0015394920483231544
Validation loss = 0.0016806584317237139
Validation loss = 0.0019220237154513597
Validation loss = 0.0028003775514662266
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00328 |
| Iteration     | 60       |
| MaximumReturn | -0.0007  |
| MinimumReturn | -0.0304  |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001497618155553937
Validation loss = 0.0011821647640317678
Validation loss = 0.0011054802453145385
Validation loss = 0.0010070910211652517
Validation loss = 0.0011410594452172518
Validation loss = 0.0012887537013739347
Validation loss = 0.0016644266434013844
Validation loss = 0.001397712156176567
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014830483123660088
Validation loss = 0.0016462383791804314
Validation loss = 0.0016434458084404469
Validation loss = 0.001578936935402453
Validation loss = 0.001466971356421709
Validation loss = 0.001779220881871879
Validation loss = 0.0021254790481179953
Validation loss = 0.0015791206387802958
Validation loss = 0.0018020233837887645
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001554986578412354
Validation loss = 0.0012861277209594846
Validation loss = 0.0012993660056963563
Validation loss = 0.0012956256978213787
Validation loss = 0.0019070464186370373
Validation loss = 0.001454245182685554
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013407451333478093
Validation loss = 0.0011951838387176394
Validation loss = 0.0014959200052544475
Validation loss = 0.001520282356068492
Validation loss = 0.0011107660830020905
Validation loss = 0.0013357611605897546
Validation loss = 0.0017492277547717094
Validation loss = 0.001157810678705573
Validation loss = 0.0017867791466414928
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019989623688161373
Validation loss = 0.0016802068566903472
Validation loss = 0.0014620536239817739
Validation loss = 0.0018428107723593712
Validation loss = 0.0013827888760715723
Validation loss = 0.0013738047564402223
Validation loss = 0.001384715549647808
Validation loss = 0.0016458379104733467
Validation loss = 0.0014680714812129736
Validation loss = 0.0019222490955144167
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00206  |
| Iteration     | 61        |
| MaximumReturn | -0.000564 |
| MinimumReturn | -0.0325   |
| TotalSamples  | 104958    |
-----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015215522143989801
Validation loss = 0.0010354301193729043
Validation loss = 0.0010518396738916636
Validation loss = 0.0013025040971115232
Validation loss = 0.0013400891330093145
Validation loss = 0.0012450434733182192
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001874161302112043
Validation loss = 0.0019063910003751516
Validation loss = 0.0012411561328917742
Validation loss = 0.0012913303216919303
Validation loss = 0.001321546034887433
Validation loss = 0.0016478938050568104
Validation loss = 0.0012687054695561528
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013142478419467807
Validation loss = 0.0013664638390764594
Validation loss = 0.001400662469677627
Validation loss = 0.0012220890494063497
Validation loss = 0.0013524213572964072
Validation loss = 0.0015549859963357449
Validation loss = 0.0017568229231983423
Validation loss = 0.0011504014255478978
Validation loss = 0.0013904203660786152
Validation loss = 0.0013879402540624142
Validation loss = 0.0014753862051293254
Validation loss = 0.0011857377830892801
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00220156810246408
Validation loss = 0.0027064941823482513
Validation loss = 0.0012560503091663122
Validation loss = 0.0012049415381625295
Validation loss = 0.001141004147939384
Validation loss = 0.0012110783718526363
Validation loss = 0.001178748207166791
Validation loss = 0.0019966205582022667
Validation loss = 0.0011100104311481118
Validation loss = 0.001196108991280198
Validation loss = 0.0013015724252909422
Validation loss = 0.0010590996826067567
Validation loss = 0.001297149807214737
Validation loss = 0.0015606752131134272
Validation loss = 0.0017941691912710667
Validation loss = 0.001251828740350902
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0028791206423193216
Validation loss = 0.001414602156728506
Validation loss = 0.002406469313427806
Validation loss = 0.0017531687626615167
Validation loss = 0.0013502096990123391
Validation loss = 0.0013270748313516378
Validation loss = 0.0017083409475162625
Validation loss = 0.0013844928471371531
Validation loss = 0.0016284764278680086
Validation loss = 0.0014205517945811152
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.8    |
| Iteration     | 62       |
| MaximumReturn | -0.00327 |
| MinimumReturn | -148     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005156605504453182
Validation loss = 0.0008297297172248363
Validation loss = 0.0011000336380675435
Validation loss = 0.0007561797392554581
Validation loss = 0.0008174065151251853
Validation loss = 0.0007268069311976433
Validation loss = 0.0009755416540428996
Validation loss = 0.0015294611221179366
Validation loss = 0.0007658581016585231
Validation loss = 0.0005915937945246696
Validation loss = 0.0006269974401220679
Validation loss = 0.0009241070365533233
Validation loss = 0.0009529232047498226
Validation loss = 0.0009203731315210462
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0028739566914737225
Validation loss = 0.0014558859402313828
Validation loss = 0.0009120868635363877
Validation loss = 0.0010276495013386011
Validation loss = 0.0009112770785577595
Validation loss = 0.0011606380576267838
Validation loss = 0.000807775417342782
Validation loss = 0.0007404503412544727
Validation loss = 0.001689111697487533
Validation loss = 0.0008176222327165306
Validation loss = 0.0012096394784748554
Validation loss = 0.0007033218280412257
Validation loss = 0.0011572646908462048
Validation loss = 0.0009606003877706826
Validation loss = 0.0007119381334632635
Validation loss = 0.002276684157550335
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005444228183478117
Validation loss = 0.0008530152263119817
Validation loss = 0.0011473862687125802
Validation loss = 0.0008549382910132408
Validation loss = 0.0009463229216635227
Validation loss = 0.001011669752188027
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0023708748631179333
Validation loss = 0.0012966877548024058
Validation loss = 0.0011356694158166647
Validation loss = 0.000813260383438319
Validation loss = 0.0013170388992875814
Validation loss = 0.0007517104386352003
Validation loss = 0.0026937462389469147
Validation loss = 0.0008086772286333144
Validation loss = 0.0008914933423511684
Validation loss = 0.0008621919550932944
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0035813478752970695
Validation loss = 0.0009715835913084447
Validation loss = 0.0010605072602629662
Validation loss = 0.0007968805730342865
Validation loss = 0.000874946010299027
Validation loss = 0.0009717990760691464
Validation loss = 0.001107349875383079
Validation loss = 0.0011322658974677324
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27.1    |
| Iteration     | 63       |
| MaximumReturn | -0.0271  |
| MinimumReturn | -144     |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00243688584305346
Validation loss = 0.0025654088240116835
Validation loss = 0.002323221182450652
Validation loss = 0.0021336532663553953
Validation loss = 0.0030825454741716385
Validation loss = 0.003081101458519697
Validation loss = 0.0024799464736133814
Validation loss = 0.002845002571120858
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002106127329170704
Validation loss = 0.0026931283064186573
Validation loss = 0.0016298037953674793
Validation loss = 0.0016576382331550121
Validation loss = 0.0017102843848988414
Validation loss = 0.0018012934597209096
Validation loss = 0.001327073317952454
Validation loss = 0.0013682940043509007
Validation loss = 0.0020964033901691437
Validation loss = 0.001868477906100452
Validation loss = 0.0013778427382931113
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00232680537737906
Validation loss = 0.00197887746617198
Validation loss = 0.0018244028324261308
Validation loss = 0.0022592362947762012
Validation loss = 0.0021473250817507505
Validation loss = 0.0020557211246341467
Validation loss = 0.0023441805969923735
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002536441432312131
Validation loss = 0.002330835210159421
Validation loss = 0.0023527746088802814
Validation loss = 0.00247445167042315
Validation loss = 0.002320252126082778
Validation loss = 0.0025321070570498705
Validation loss = 0.0020393994636833668
Validation loss = 0.0026525799185037613
Validation loss = 0.001956871012225747
Validation loss = 0.002138259820640087
Validation loss = 0.003135840641334653
Validation loss = 0.0022296712268143892
Validation loss = 0.0022596758790314198
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025603631511330605
Validation loss = 0.002858332823961973
Validation loss = 0.0030643208883702755
Validation loss = 0.002825862728059292
Validation loss = 0.0023054098710417747
Validation loss = 0.002722487086430192
Validation loss = 0.0021548049990087748
Validation loss = 0.002998508745804429
Validation loss = 0.002168788108974695
Validation loss = 0.0020922538824379444
Validation loss = 0.001981811597943306
Validation loss = 0.0021228445693850517
Validation loss = 0.002888185903429985
Validation loss = 0.0024195711594074965
Validation loss = 0.0018992038676515222
Validation loss = 0.0018853801302611828
Validation loss = 0.002405555685982108
Validation loss = 0.0023387877736240625
Validation loss = 0.0020063777919858694
Validation loss = 0.0018230873392894864
Validation loss = 0.001879765884950757
Validation loss = 0.0019565760158002377
Validation loss = 0.002583837602287531
Validation loss = 0.0023504081182181835
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -144     |
| Iteration     | 64       |
| MaximumReturn | -0.149   |
| MinimumReturn | -209     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004147359635680914
Validation loss = 0.002584871370345354
Validation loss = 0.002726750448346138
Validation loss = 0.0038196498062461615
Validation loss = 0.0020541346166282892
Validation loss = 0.0019851948600262403
Validation loss = 0.0022976696491241455
Validation loss = 0.0020801140926778316
Validation loss = 0.001970413140952587
Validation loss = 0.0035411976277828217
Validation loss = 0.0026750131510198116
Validation loss = 0.0031656199134886265
Validation loss = 0.0021443390287458897
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0031344702001661062
Validation loss = 0.002211914863437414
Validation loss = 0.002015015808865428
Validation loss = 0.002604305511340499
Validation loss = 0.002181974006816745
Validation loss = 0.0015414332738146186
Validation loss = 0.0036288981791585684
Validation loss = 0.0020618962589651346
Validation loss = 0.0015132167609408498
Validation loss = 0.001967323711141944
Validation loss = 0.001621945295482874
Validation loss = 0.001869141822680831
Validation loss = 0.0023485743440687656
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006060000043362379
Validation loss = 0.002449295949190855
Validation loss = 0.0026397721376270056
Validation loss = 0.0025573493912816048
Validation loss = 0.0029034302569925785
Validation loss = 0.0016782202292233706
Validation loss = 0.0021003298461437225
Validation loss = 0.0017479368252679706
Validation loss = 0.0026283832266926765
Validation loss = 0.001939487992785871
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009880964644253254
Validation loss = 0.003396215382963419
Validation loss = 0.002359225880354643
Validation loss = 0.0036094447132200003
Validation loss = 0.004019929561764002
Validation loss = 0.001841954537667334
Validation loss = 0.002722690347582102
Validation loss = 0.0021103862673044205
Validation loss = 0.0027836866211146116
Validation loss = 0.0035595600493252277
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0039438046514987946
Validation loss = 0.001874790177680552
Validation loss = 0.001775542157702148
Validation loss = 0.0016892952844500542
Validation loss = 0.002355167642235756
Validation loss = 0.0037779940757900476
Validation loss = 0.0025557377375662327
Validation loss = 0.0015616078162565827
Validation loss = 0.004128288011997938
Validation loss = 0.0021358979865908623
Validation loss = 0.002525661839172244
Validation loss = 0.002012452110648155
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -148     |
| Iteration     | 65       |
| MaximumReturn | -23.2    |
| MinimumReturn | -219     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017176289111375809
Validation loss = 0.002050934126600623
Validation loss = 0.0015964083140715957
Validation loss = 0.0013782198075205088
Validation loss = 0.0019155987538397312
Validation loss = 0.0015493306564167142
Validation loss = 0.001691917423158884
Validation loss = 0.0017392081208527088
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0023104643914848566
Validation loss = 0.0013648783788084984
Validation loss = 0.0012722762767225504
Validation loss = 0.001561513519845903
Validation loss = 0.0018751082243397832
Validation loss = 0.0013540232321247458
Validation loss = 0.0013255239464342594
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022454340942204
Validation loss = 0.0018522365717217326
Validation loss = 0.0020202414598315954
Validation loss = 0.001424615504220128
Validation loss = 0.001423406065441668
Validation loss = 0.0017950049368664622
Validation loss = 0.0017699929885566235
Validation loss = 0.0013077339390292764
Validation loss = 0.0018794422503560781
Validation loss = 0.0014925013529136777
Validation loss = 0.001612805761396885
Validation loss = 0.0016541670775040984
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0035359710454940796
Validation loss = 0.0019806288182735443
Validation loss = 0.0016640934627503157
Validation loss = 0.002069530077278614
Validation loss = 0.001527615007944405
Validation loss = 0.0013420663308352232
Validation loss = 0.001966268289834261
Validation loss = 0.0016112609300762415
Validation loss = 0.0015517836436629295
Validation loss = 0.001516382209956646
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0029026404954493046
Validation loss = 0.0014873214531689882
Validation loss = 0.0029691176023334265
Validation loss = 0.0012961416505277157
Validation loss = 0.0014270636020228267
Validation loss = 0.0011142350267618895
Validation loss = 0.001533166621811688
Validation loss = 0.002362117636948824
Validation loss = 0.0016323791351169348
Validation loss = 0.0014666059287264943
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26      |
| Iteration     | 66       |
| MaximumReturn | -0.0135  |
| MinimumReturn | -171     |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015876262914389372
Validation loss = 0.003534413408488035
Validation loss = 0.0016084684757515788
Validation loss = 0.0017844978719949722
Validation loss = 0.0018265139078721404
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001076890155673027
Validation loss = 0.0017937006196007133
Validation loss = 0.0011631365632638335
Validation loss = 0.0013141479576006532
Validation loss = 0.001375282066874206
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014343435177579522
Validation loss = 0.0017907092114910483
Validation loss = 0.0016791961388662457
Validation loss = 0.003498567035421729
Validation loss = 0.0016920153284445405
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019281221320852637
Validation loss = 0.0015434686793014407
Validation loss = 0.001362746232189238
Validation loss = 0.0014185371110215783
Validation loss = 0.0030921832658350468
Validation loss = 0.0018306063720956445
Validation loss = 0.0013386935461312532
Validation loss = 0.0017460713861510158
Validation loss = 0.001618902082554996
Validation loss = 0.0012334603816270828
Validation loss = 0.0015353800263255835
Validation loss = 0.0015871376963332295
Validation loss = 0.0019279764965176582
Validation loss = 0.0017263682093471289
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012843693839386106
Validation loss = 0.0013201822293922305
Validation loss = 0.0017724159406498075
Validation loss = 0.001512192189693451
Validation loss = 0.0018217507749795914
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -209     |
| Iteration     | 67       |
| MaximumReturn | -166     |
| MinimumReturn | -228     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006146029569208622
Validation loss = 0.0011256205616518855
Validation loss = 0.0011016650823876262
Validation loss = 0.0009625933016650379
Validation loss = 0.0008174865506589413
Validation loss = 0.0012121315812692046
Validation loss = 0.0010271009523421526
Validation loss = 0.004404072184115648
Validation loss = 0.0008500777184963226
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003539919387549162
Validation loss = 0.002846134826540947
Validation loss = 0.0009058841387741268
Validation loss = 0.001364072086289525
Validation loss = 0.0015576863661408424
Validation loss = 0.0011928362073376775
Validation loss = 0.0007980747614055872
Validation loss = 0.0010382352629676461
Validation loss = 0.0010785002959892154
Validation loss = 0.0009544713539071381
Validation loss = 0.0012666282709687948
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004175723996013403
Validation loss = 0.002011214615777135
Validation loss = 0.0010117662604898214
Validation loss = 0.000978365889750421
Validation loss = 0.0015951084205880761
Validation loss = 0.0012119936291128397
Validation loss = 0.0015317639335989952
Validation loss = 0.0009886955376714468
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00430707773193717
Validation loss = 0.0013339839642867446
Validation loss = 0.0012934698024764657
Validation loss = 0.0014492793707177043
Validation loss = 0.00153072748798877
Validation loss = 0.0010558231733739376
Validation loss = 0.0011991598876193166
Validation loss = 0.0010028898250311613
Validation loss = 0.0015722166281193495
Validation loss = 0.0015765584539622068
Validation loss = 0.0013856118312105536
Validation loss = 0.0015689620049670339
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002955124480649829
Validation loss = 0.0013120013754814863
Validation loss = 0.001788431778550148
Validation loss = 0.0012356649385765195
Validation loss = 0.0010847270023077726
Validation loss = 0.0011175817344337702
Validation loss = 0.0010208715684711933
Validation loss = 0.0011396909831091762
Validation loss = 0.0017692395485937595
Validation loss = 0.00139935954939574
Validation loss = 0.0020591113716363907
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -142     |
| Iteration     | 68       |
| MaximumReturn | -0.212   |
| MinimumReturn | -209     |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0022146892733871937
Validation loss = 0.001747549045830965
Validation loss = 0.001253515831194818
Validation loss = 0.001443014945834875
Validation loss = 0.0015236359322443604
Validation loss = 0.0013621404068544507
Validation loss = 0.0011964380973950028
Validation loss = 0.0015736522618681192
Validation loss = 0.0012662477092817426
Validation loss = 0.0011732277926057577
Validation loss = 0.0010230623884126544
Validation loss = 0.0011180747533217072
Validation loss = 0.001087635406292975
Validation loss = 0.0016817840514704585
Validation loss = 0.0017642523162066936
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0031008741352707148
Validation loss = 0.001765986206009984
Validation loss = 0.00258143269456923
Validation loss = 0.0016192100010812283
Validation loss = 0.0027408557943999767
Validation loss = 0.0016012073028832674
Validation loss = 0.0013092466397210956
Validation loss = 0.0012720116646960378
Validation loss = 0.0012845031451433897
Validation loss = 0.0007968053687363863
Validation loss = 0.0016649540048092604
Validation loss = 0.001234795548953116
Validation loss = 0.0013653734931722283
Validation loss = 0.0010534683242440224
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0028437285218387842
Validation loss = 0.0015839665429666638
Validation loss = 0.0025618819054216146
Validation loss = 0.001802846323698759
Validation loss = 0.0014638890279456973
Validation loss = 0.001181632513180375
Validation loss = 0.0019151153974235058
Validation loss = 0.0010473615257069468
Validation loss = 0.001738174119964242
Validation loss = 0.0013129775179550052
Validation loss = 0.0014855124754831195
Validation loss = 0.0015540652675554156
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015698488568887115
Validation loss = 0.0010331922676414251
Validation loss = 0.0015907189808785915
Validation loss = 0.0014821043005213141
Validation loss = 0.0016897066961973906
Validation loss = 0.0013392166001722217
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0021927973721176386
Validation loss = 0.001141540938988328
Validation loss = 0.0011827884009107947
Validation loss = 0.001317823538556695
Validation loss = 0.00201620371080935
Validation loss = 0.0013693516375496984
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.114   |
| Iteration     | 69       |
| MaximumReturn | -0.0779  |
| MinimumReturn | -0.186   |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001493140123784542
Validation loss = 0.0011950863990932703
Validation loss = 0.0028782542794942856
Validation loss = 0.001237576245330274
Validation loss = 0.0014483286067843437
Validation loss = 0.0016506541287526488
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018255593022331595
Validation loss = 0.0016028302488848567
Validation loss = 0.00146347819827497
Validation loss = 0.0017291795229539275
Validation loss = 0.0010028684046119452
Validation loss = 0.0011237275321036577
Validation loss = 0.0025258827954530716
Validation loss = 0.001414126600138843
Validation loss = 0.001067129080183804
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024604247882962227
Validation loss = 0.0015352326445281506
Validation loss = 0.003854907350614667
Validation loss = 0.0021598420571535826
Validation loss = 0.0024439601693302393
Validation loss = 0.0017842735396698117
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010908356634899974
Validation loss = 0.003084164811298251
Validation loss = 0.001718878629617393
Validation loss = 0.0017936262302100658
Validation loss = 0.001597058610059321
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012260046787559986
Validation loss = 0.002171142725273967
Validation loss = 0.0011636909330263734
Validation loss = 0.001543269376270473
Validation loss = 0.0022369204089045525
Validation loss = 0.001896275207400322
Validation loss = 0.0020821376238018274
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00828 |
| Iteration     | 70       |
| MaximumReturn | -0.00106 |
| MinimumReturn | -0.0357  |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0024796137586236
Validation loss = 0.0010847263038158417
Validation loss = 0.0015243071829900146
Validation loss = 0.0016872176202014089
Validation loss = 0.0013574679614976048
Validation loss = 0.0020090739708393812
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010543771786615252
Validation loss = 0.0009266805136576295
Validation loss = 0.0009333228808827698
Validation loss = 0.002109288703650236
Validation loss = 0.0012364578433334827
Validation loss = 0.0012209380511194468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014475429197773337
Validation loss = 0.0020258682779967785
Validation loss = 0.0011633080430328846
Validation loss = 0.0011340634664520621
Validation loss = 0.0011642557801678777
Validation loss = 0.0011026016436517239
Validation loss = 0.0009117203881032765
Validation loss = 0.0015518092550337315
Validation loss = 0.0016219379613175988
Validation loss = 0.0026675029657781124
Validation loss = 0.0017544851871207356
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002181149320676923
Validation loss = 0.0014576002722606063
Validation loss = 0.0019156578928232193
Validation loss = 0.0016717460239306092
Validation loss = 0.0023380150087177753
Validation loss = 0.0014491063775494695
Validation loss = 0.0021839451510459185
Validation loss = 0.0011581066064536572
Validation loss = 0.0015636623138561845
Validation loss = 0.0013546429108828306
Validation loss = 0.0011446246644482017
Validation loss = 0.0018567501101642847
Validation loss = 0.0027033451478928328
Validation loss = 0.0015538819134235382
Validation loss = 0.001625921344384551
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020958648528903723
Validation loss = 0.0010249004699289799
Validation loss = 0.00128226971719414
Validation loss = 0.0018272664165124297
Validation loss = 0.001974900485947728
Validation loss = 0.0025969771668314934
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -190     |
| Iteration     | 71       |
| MaximumReturn | -135     |
| MinimumReturn | -223     |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009788169991225004
Validation loss = 0.0011737323366105556
Validation loss = 0.0019981481600552797
Validation loss = 0.0015528404619544744
Validation loss = 0.0010213376954197884
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013206785079091787
Validation loss = 0.0010774674592539668
Validation loss = 0.0017801329959183931
Validation loss = 0.0008149835630320013
Validation loss = 0.001177187659777701
Validation loss = 0.0015740719391033053
Validation loss = 0.0007677729590795934
Validation loss = 0.0011024216655641794
Validation loss = 0.00168360595125705
Validation loss = 0.0013730422360822558
Validation loss = 0.0009019209537655115
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012706987326964736
Validation loss = 0.00080712023191154
Validation loss = 0.0013657850213348866
Validation loss = 0.0015118158189579844
Validation loss = 0.0008413818432018161
Validation loss = 0.0010710826609283686
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001292099361307919
Validation loss = 0.0017806414980441332
Validation loss = 0.001037709298543632
Validation loss = 0.001240511192008853
Validation loss = 0.0007696755928918719
Validation loss = 0.001205348176881671
Validation loss = 0.0016256498638540506
Validation loss = 0.0011066847946494818
Validation loss = 0.001573817222379148
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012982930056750774
Validation loss = 0.0021852462086826563
Validation loss = 0.0009718685178086162
Validation loss = 0.0015141827752813697
Validation loss = 0.0010600547539070249
Validation loss = 0.001183894113637507
Validation loss = 0.0015716467751190066
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -182     |
| Iteration     | 72       |
| MaximumReturn | -104     |
| MinimumReturn | -222     |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001271139713935554
Validation loss = 0.0010881880298256874
Validation loss = 0.0011683362536132336
Validation loss = 0.0016178175574168563
Validation loss = 0.0013042752398177981
Validation loss = 0.0015394294168800116
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00158779532648623
Validation loss = 0.0016857689479365945
Validation loss = 0.0014261500909924507
Validation loss = 0.0017125630984082818
Validation loss = 0.0013094644527882338
Validation loss = 0.001198259531520307
Validation loss = 0.0016749108908697963
Validation loss = 0.0011571069480851293
Validation loss = 0.0019189134472981095
Validation loss = 0.001241576042957604
Validation loss = 0.0014238993171602488
Validation loss = 0.0009628146071918309
Validation loss = 0.001203779480420053
Validation loss = 0.0012589917751029134
Validation loss = 0.0015953572001308203
Validation loss = 0.0011861948296427727
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0028011994436383247
Validation loss = 0.0013661178527399898
Validation loss = 0.0015898665878921747
Validation loss = 0.0019868435338139534
Validation loss = 0.0011611495865508914
Validation loss = 0.0018098135478794575
Validation loss = 0.0012756232172250748
Validation loss = 0.001620817114599049
Validation loss = 0.0019249168690294027
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002039640210568905
Validation loss = 0.001659637433476746
Validation loss = 0.0013511114520952106
Validation loss = 0.0016092066653072834
Validation loss = 0.0017147638136520982
Validation loss = 0.0017860062653198838
Validation loss = 0.0018600476905703545
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013008129317313433
Validation loss = 0.0013521978398784995
Validation loss = 0.0013098334893584251
Validation loss = 0.0012230660067871213
Validation loss = 0.0013254474615678191
Validation loss = 0.001864253543317318
Validation loss = 0.0012825734447687864
Validation loss = 0.002281943103298545
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -181     |
| Iteration     | 73       |
| MaximumReturn | -114     |
| MinimumReturn | -215     |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016021436313167214
Validation loss = 0.0014251931570470333
Validation loss = 0.0016573317116126418
Validation loss = 0.0013645462458953261
Validation loss = 0.001894248416647315
Validation loss = 0.0014413141179829836
Validation loss = 0.0014280801406130195
Validation loss = 0.002180937211960554
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018471985822543502
Validation loss = 0.0011536068050190806
Validation loss = 0.0020318259485065937
Validation loss = 0.0016812478424981236
Validation loss = 0.0012806798331439495
Validation loss = 0.0011668375227600336
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017231820384040475
Validation loss = 0.0015750153688713908
Validation loss = 0.001442963839508593
Validation loss = 0.0017614574171602726
Validation loss = 0.0017569790361449122
Validation loss = 0.001776099088601768
Validation loss = 0.002753410954028368
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0028442745096981525
Validation loss = 0.0015052458038553596
Validation loss = 0.0019206893630325794
Validation loss = 0.0015720396768301725
Validation loss = 0.001958814449608326
Validation loss = 0.0016025373479351401
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002252559643238783
Validation loss = 0.0015673673478886485
Validation loss = 0.0012959829764440656
Validation loss = 0.0013928344706073403
Validation loss = 0.001700707944110036
Validation loss = 0.0016953053418546915
Validation loss = 0.0012767070438712835
Validation loss = 0.003484453307464719
Validation loss = 0.001061574905179441
Validation loss = 0.00107144087087363
Validation loss = 0.0015449188649654388
Validation loss = 0.0018062355229631066
Validation loss = 0.0012042960152029991
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -85.4    |
| Iteration     | 74       |
| MaximumReturn | -0.311   |
| MinimumReturn | -197     |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014696011785417795
Validation loss = 0.0013515452155843377
Validation loss = 0.002103774342685938
Validation loss = 0.001516841584816575
Validation loss = 0.0017292554257437587
Validation loss = 0.0016653885832056403
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014948489842936397
Validation loss = 0.001080213114619255
Validation loss = 0.001218609744682908
Validation loss = 0.001288590719923377
Validation loss = 0.00190926983486861
Validation loss = 0.0012377267703413963
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016495559830218554
Validation loss = 0.001497425138950348
Validation loss = 0.0013029116671532393
Validation loss = 0.001308038947172463
Validation loss = 0.0017405619146302342
Validation loss = 0.0014025535201653838
Validation loss = 0.0013982985401526093
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017769471742212772
Validation loss = 0.0014564385637640953
Validation loss = 0.0012234477326273918
Validation loss = 0.001067560282535851
Validation loss = 0.001948041608557105
Validation loss = 0.0022473286371678114
Validation loss = 0.001356415799818933
Validation loss = 0.0016192065086215734
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012068410869687796
Validation loss = 0.0011227949289605021
Validation loss = 0.0016479839105159044
Validation loss = 0.0015005215536803007
Validation loss = 0.0010581229580566287
Validation loss = 0.001597520662471652
Validation loss = 0.0010629018070176244
Validation loss = 0.0019898326136171818
Validation loss = 0.0010222296696156263
Validation loss = 0.0011469655437394977
Validation loss = 0.0014519286341965199
Validation loss = 0.0012098174775019288
Validation loss = 0.0017533747013658285
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -114     |
| Iteration     | 75       |
| MaximumReturn | -0.742   |
| MinimumReturn | -185     |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017188536003232002
Validation loss = 0.001261823927052319
Validation loss = 0.0010704807937145233
Validation loss = 0.0009881369769573212
Validation loss = 0.0011408748105168343
Validation loss = 0.0014089308679103851
Validation loss = 0.0011396283516660333
Validation loss = 0.0011887361761182547
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010999642545357347
Validation loss = 0.0014228972140699625
Validation loss = 0.0011791051365435123
Validation loss = 0.0011187407653778791
Validation loss = 0.001700199325568974
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010510757565498352
Validation loss = 0.0012352229095995426
Validation loss = 0.0011765207163989544
Validation loss = 0.0010083331726491451
Validation loss = 0.0012953697005286813
Validation loss = 0.001129455165937543
Validation loss = 0.0014850443694740534
Validation loss = 0.002124645747244358
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013304685708135366
Validation loss = 0.001842127414420247
Validation loss = 0.0021682351361960173
Validation loss = 0.0013132838066667318
Validation loss = 0.0014572631334885955
Validation loss = 0.001200852682814002
Validation loss = 0.0010981801897287369
Validation loss = 0.0010330796940252185
Validation loss = 0.001437523402273655
Validation loss = 0.0011562092695385218
Validation loss = 0.0013952283188700676
Validation loss = 0.001329813851043582
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001218190649524331
Validation loss = 0.0011853516334667802
Validation loss = 0.0016453324351459742
Validation loss = 0.000986196449957788
Validation loss = 0.0011977780377492309
Validation loss = 0.0017275161808356643
Validation loss = 0.0011484140995889902
Validation loss = 0.0010801228927448392
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.4    |
| Iteration     | 76       |
| MaximumReturn | -0.00161 |
| MinimumReturn | -99.8    |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011667482322081923
Validation loss = 0.0012600342743098736
Validation loss = 0.0008976992103271186
Validation loss = 0.003027817467227578
Validation loss = 0.0011647518258541822
Validation loss = 0.0011804394889622927
Validation loss = 0.0019085555104538798
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016930419951677322
Validation loss = 0.001165318302810192
Validation loss = 0.001490314258262515
Validation loss = 0.0011506646405905485
Validation loss = 0.000949620152823627
Validation loss = 0.001551894354633987
Validation loss = 0.0012821998680010438
Validation loss = 0.0012468304485082626
Validation loss = 0.0011040075914934278
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014655315317213535
Validation loss = 0.0010969082359224558
Validation loss = 0.0012486367486417294
Validation loss = 0.0012264996767044067
Validation loss = 0.000935994612518698
Validation loss = 0.002187418518587947
Validation loss = 0.0013775147963315248
Validation loss = 0.0010800431482493877
Validation loss = 0.0012222201330587268
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017735010478645563
Validation loss = 0.0011126063764095306
Validation loss = 0.0011489694006741047
Validation loss = 0.001083499751985073
Validation loss = 0.0012261162046343088
Validation loss = 0.001079011824913323
Validation loss = 0.0015674413880333304
Validation loss = 0.0014688849914819002
Validation loss = 0.0014010388404130936
Validation loss = 0.0011125174351036549
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009218389168381691
Validation loss = 0.0016303758602589369
Validation loss = 0.001396111911162734
Validation loss = 0.0012937954161316156
Validation loss = 0.0011464112903922796
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -71.3    |
| Iteration     | 77       |
| MaximumReturn | -0.157   |
| MinimumReturn | -185     |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001444453140720725
Validation loss = 0.0009621148928999901
Validation loss = 0.0010328034404665232
Validation loss = 0.0011833717580884695
Validation loss = 0.001823077560402453
Validation loss = 0.0009645858663134277
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017537599196657538
Validation loss = 0.0009814647492021322
Validation loss = 0.001174899865873158
Validation loss = 0.001098092645406723
Validation loss = 0.0012337665539234877
Validation loss = 0.000869413313921541
Validation loss = 0.001346285454928875
Validation loss = 0.0010764559265226126
Validation loss = 0.0011261642212048173
Validation loss = 0.0016166545683518052
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010348138166591525
Validation loss = 0.001133818761445582
Validation loss = 0.001031178398989141
Validation loss = 0.0011180279543623328
Validation loss = 0.0012527197832241654
Validation loss = 0.0009533862466923892
Validation loss = 0.000957914802711457
Validation loss = 0.0014135255478322506
Validation loss = 0.001506291562691331
Validation loss = 0.0020327952224761248
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001103323302231729
Validation loss = 0.0009875728283077478
Validation loss = 0.0013545306865125895
Validation loss = 0.0011976235546171665
Validation loss = 0.0012689890572801232
Validation loss = 0.00103271403349936
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011195815168321133
Validation loss = 0.0010621970286592841
Validation loss = 0.0013585597043856978
Validation loss = 0.001247954205609858
Validation loss = 0.00121052167378366
Validation loss = 0.0014389683492481709
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -196     |
| Iteration     | 78       |
| MaximumReturn | -156     |
| MinimumReturn | -227     |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001115973456762731
Validation loss = 0.0014198924181982875
Validation loss = 0.0010235124500468373
Validation loss = 0.001095551997423172
Validation loss = 0.0014358528424054384
Validation loss = 0.0012047431664541364
Validation loss = 0.000821187742985785
Validation loss = 0.0010566560085862875
Validation loss = 0.0009663075907155871
Validation loss = 0.0012488003121688962
Validation loss = 0.000928541470784694
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011680027237161994
Validation loss = 0.00169388169888407
Validation loss = 0.0011801761575043201
Validation loss = 0.0008825307595543563
Validation loss = 0.0013719669077545404
Validation loss = 0.0009508354123681784
Validation loss = 0.001203791587613523
Validation loss = 0.0012334069469943643
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013016689335927367
Validation loss = 0.0011054609203711152
Validation loss = 0.0010455630254000425
Validation loss = 0.0009852431248873472
Validation loss = 0.001064967829734087
Validation loss = 0.0010934622259810567
Validation loss = 0.0009061098680831492
Validation loss = 0.0011803308734670281
Validation loss = 0.0016176834469661117
Validation loss = 0.0009844439337030053
Validation loss = 0.001016237074509263
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001654820516705513
Validation loss = 0.0011227675713598728
Validation loss = 0.0016368527431041002
Validation loss = 0.0018202041974291205
Validation loss = 0.0011090096086263657
Validation loss = 0.0015454484382644296
Validation loss = 0.000909207621589303
Validation loss = 0.0011479959357529879
Validation loss = 0.001125680049881339
Validation loss = 0.001504427520558238
Validation loss = 0.0008976523531600833
Validation loss = 0.000775095249991864
Validation loss = 0.0009114076965488493
Validation loss = 0.0012654113816097379
Validation loss = 0.0008280670153908432
Validation loss = 0.0010987940477207303
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018072868697345257
Validation loss = 0.0010724314488470554
Validation loss = 0.001094162929803133
Validation loss = 0.0011606719344854355
Validation loss = 0.000814057479146868
Validation loss = 0.0009587464737705886
Validation loss = 0.0012985877692699432
Validation loss = 0.0012670215219259262
Validation loss = 0.0013361283345147967
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -229     |
| Iteration     | 79       |
| MaximumReturn | -211     |
| MinimumReturn | -238     |
| TotalSamples  | 134946   |
----------------------------
