Logging to experiments/invertedPendulum/IPA01/Tue-01-Nov-2022-07-59-07-PM-CDT_invertedPendulum_trpo_iteration_20_seed2431
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7920148968696594
Validation loss = 0.4050298035144806
Validation loss = 0.35680145025253296
Validation loss = 0.32741492986679077
Validation loss = 0.32204028964042664
Validation loss = 0.32248279452323914
Validation loss = 0.2856895327568054
Validation loss = 0.2743818759918213
Validation loss = 0.26341667771339417
Validation loss = 0.2651556134223938
Validation loss = 0.2558998167514801
Validation loss = 0.24333487451076508
Validation loss = 0.24207013845443726
Validation loss = 0.23377683758735657
Validation loss = 0.22120139002799988
Validation loss = 0.22479820251464844
Validation loss = 0.21702462434768677
Validation loss = 0.20481809973716736
Validation loss = 0.22522751986980438
Validation loss = 0.21575035154819489
Validation loss = 0.20145748555660248
Validation loss = 0.18928824365139008
Validation loss = 0.186934694647789
Validation loss = 0.18408836424350739
Validation loss = 0.18750588595867157
Validation loss = 0.1704879105091095
Validation loss = 0.1677630990743637
Validation loss = 0.14852607250213623
Validation loss = 0.1483049988746643
Validation loss = 0.1397794485092163
Validation loss = 0.14144004881381989
Validation loss = 0.1322307139635086
Validation loss = 0.1360214501619339
Validation loss = 0.13159999251365662
Validation loss = 0.1408873051404953
Validation loss = 0.1347307711839676
Validation loss = 0.12253593653440475
Validation loss = 0.13636551797389984
Validation loss = 0.12579865753650665
Validation loss = 0.1280098706483841
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7309831380844116
Validation loss = 0.4439558982849121
Validation loss = 0.3695940673351288
Validation loss = 0.32168081402778625
Validation loss = 0.3232049345970154
Validation loss = 0.30197665095329285
Validation loss = 0.2817947566509247
Validation loss = 0.27188840508461
Validation loss = 0.26464420557022095
Validation loss = 0.2554568946361542
Validation loss = 0.2515261471271515
Validation loss = 0.23999929428100586
Validation loss = 0.2291925698518753
Validation loss = 0.2291000634431839
Validation loss = 0.22725246846675873
Validation loss = 0.2144964039325714
Validation loss = 0.21277320384979248
Validation loss = 0.19715195894241333
Validation loss = 0.2084304243326187
Validation loss = 0.19260606169700623
Validation loss = 0.20034107565879822
Validation loss = 0.18431222438812256
Validation loss = 0.17771349847316742
Validation loss = 0.17063206434249878
Validation loss = 0.16659817099571228
Validation loss = 0.15777167677879333
Validation loss = 0.15439337491989136
Validation loss = 0.15519015491008759
Validation loss = 0.15418179333209991
Validation loss = 0.14101526141166687
Validation loss = 0.14722813665866852
Validation loss = 0.1347113400697708
Validation loss = 0.13758213818073273
Validation loss = 0.12925560772418976
Validation loss = 0.16699346899986267
Validation loss = 0.15310530364513397
Validation loss = 0.1405678391456604
Validation loss = 0.13122229278087616
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7639633417129517
Validation loss = 0.40138673782348633
Validation loss = 0.3650328516960144
Validation loss = 0.3359317481517792
Validation loss = 0.33331719040870667
Validation loss = 0.31310325860977173
Validation loss = 0.29203569889068604
Validation loss = 0.2757734954357147
Validation loss = 0.27366694808006287
Validation loss = 0.2528837323188782
Validation loss = 0.25367021560668945
Validation loss = 0.2441408634185791
Validation loss = 0.23733507096767426
Validation loss = 0.22248825430870056
Validation loss = 0.24089767038822174
Validation loss = 0.2376725971698761
Validation loss = 0.2165120244026184
Validation loss = 0.2018379122018814
Validation loss = 0.19726230204105377
Validation loss = 0.2045736163854599
Validation loss = 0.18845750391483307
Validation loss = 0.20393489301204681
Validation loss = 0.1720566749572754
Validation loss = 0.17129634320735931
Validation loss = 0.16480393707752228
Validation loss = 0.1624319851398468
Validation loss = 0.15409240126609802
Validation loss = 0.15035127103328705
Validation loss = 0.14827486872673035
Validation loss = 0.14635373651981354
Validation loss = 0.14775314927101135
Validation loss = 0.13839009404182434
Validation loss = 0.1502181738615036
Validation loss = 0.143074169754982
Validation loss = 0.14786456525325775
Validation loss = 0.14369969069957733
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7677818536758423
Validation loss = 0.4075513780117035
Validation loss = 0.3504062592983246
Validation loss = 0.3271452784538269
Validation loss = 0.32395222783088684
Validation loss = 0.2962923049926758
Validation loss = 0.2821381092071533
Validation loss = 0.25954586267471313
Validation loss = 0.2567422688007355
Validation loss = 0.24229328334331512
Validation loss = 0.2405904084444046
Validation loss = 0.23748554289340973
Validation loss = 0.23373958468437195
Validation loss = 0.226677805185318
Validation loss = 0.2422294020652771
Validation loss = 0.211960569024086
Validation loss = 0.20552320778369904
Validation loss = 0.19833128154277802
Validation loss = 0.19137676060199738
Validation loss = 0.2037639170885086
Validation loss = 0.19139128923416138
Validation loss = 0.1822560578584671
Validation loss = 0.16989128291606903
Validation loss = 0.1729087382555008
Validation loss = 0.15255874395370483
Validation loss = 0.16699394583702087
Validation loss = 0.15151232481002808
Validation loss = 0.1501658856868744
Validation loss = 0.14388765394687653
Validation loss = 0.13572938740253448
Validation loss = 0.14059939980506897
Validation loss = 0.14372067153453827
Validation loss = 0.13898897171020508
Validation loss = 0.12734369933605194
Validation loss = 0.14126184582710266
Validation loss = 0.1466810703277588
Validation loss = 0.13967403769493103
Validation loss = 0.14122328162193298
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7324762940406799
Validation loss = 0.46358224749565125
Validation loss = 0.366184800863266
Validation loss = 0.3404931426048279
Validation loss = 0.32843920588493347
Validation loss = 0.3148851692676544
Validation loss = 0.30958297848701477
Validation loss = 0.29071667790412903
Validation loss = 0.2835072875022888
Validation loss = 0.2563439607620239
Validation loss = 0.24420006573200226
Validation loss = 0.23705366253852844
Validation loss = 0.23465774953365326
Validation loss = 0.23257207870483398
Validation loss = 0.2293565273284912
Validation loss = 0.21463412046432495
Validation loss = 0.2191629409790039
Validation loss = 0.22259029746055603
Validation loss = 0.22845451533794403
Validation loss = 0.23086531460285187
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.1     |
| Iteration     | 0        |
| MaximumReturn | -0.0359  |
| MinimumReturn | -36      |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3777625262737274
Validation loss = 0.18962042033672333
Validation loss = 0.17742782831192017
Validation loss = 0.1609511375427246
Validation loss = 0.16007938981056213
Validation loss = 0.14285698533058167
Validation loss = 0.13686460256576538
Validation loss = 0.1350596845149994
Validation loss = 0.11285724490880966
Validation loss = 0.1138334795832634
Validation loss = 0.1022326648235321
Validation loss = 0.0991169810295105
Validation loss = 0.09943679720163345
Validation loss = 0.09131619334220886
Validation loss = 0.08123224228620529
Validation loss = 0.08645173907279968
Validation loss = 0.08439882844686508
Validation loss = 0.07329943776130676
Validation loss = 0.08086489886045456
Validation loss = 0.07761412113904953
Validation loss = 0.07167676091194153
Validation loss = 0.07099898159503937
Validation loss = 0.08114723861217499
Validation loss = 0.07542600482702255
Validation loss = 0.07121734321117401
Validation loss = 0.06561783701181412
Validation loss = 0.06171468645334244
Validation loss = 0.06902646273374557
Validation loss = 0.07470529526472092
Validation loss = 0.059027545154094696
Validation loss = 0.05812159553170204
Validation loss = 0.06683164089918137
Validation loss = 0.058375876396894455
Validation loss = 0.058421261608600616
Validation loss = 0.06054685637354851
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.34288159012794495
Validation loss = 0.19586192071437836
Validation loss = 0.17136847972869873
Validation loss = 0.15809254348278046
Validation loss = 0.14491693675518036
Validation loss = 0.13845230638980865
Validation loss = 0.1306839883327484
Validation loss = 0.12537279725074768
Validation loss = 0.12621381878852844
Validation loss = 0.10919155925512314
Validation loss = 0.10525045543909073
Validation loss = 0.1061924397945404
Validation loss = 0.09459000080823898
Validation loss = 0.10172179341316223
Validation loss = 0.09069832414388657
Validation loss = 0.08721126616001129
Validation loss = 0.08018815517425537
Validation loss = 0.0844777300953865
Validation loss = 0.0840824618935585
Validation loss = 0.07526987791061401
Validation loss = 0.07770153135061264
Validation loss = 0.08245396614074707
Validation loss = 0.07810752838850021
Validation loss = 0.08243697136640549
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.37399324774742126
Validation loss = 0.19145531952381134
Validation loss = 0.1654452085494995
Validation loss = 0.15469366312026978
Validation loss = 0.14732199907302856
Validation loss = 0.13234321773052216
Validation loss = 0.12030525505542755
Validation loss = 0.11082009226083755
Validation loss = 0.10584786534309387
Validation loss = 0.11440406739711761
Validation loss = 0.09970736503601074
Validation loss = 0.08582167327404022
Validation loss = 0.0821882113814354
Validation loss = 0.08337920159101486
Validation loss = 0.08109802007675171
Validation loss = 0.0845557153224945
Validation loss = 0.07326684892177582
Validation loss = 0.07222282886505127
Validation loss = 0.07947182655334473
Validation loss = 0.08261320739984512
Validation loss = 0.07992632687091827
Validation loss = 0.06883860379457474
Validation loss = 0.06595771759748459
Validation loss = 0.06445635110139847
Validation loss = 0.06411309540271759
Validation loss = 0.08057841658592224
Validation loss = 0.0706762969493866
Validation loss = 0.07719582319259644
Validation loss = 0.06214630976319313
Validation loss = 0.05864756181836128
Validation loss = 0.056610383093357086
Validation loss = 0.05240967497229576
Validation loss = 0.05324361473321915
Validation loss = 0.05928235873579979
Validation loss = 0.05931874364614487
Validation loss = 0.05644605681300163
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3615184724330902
Validation loss = 0.19800405204296112
Validation loss = 0.17433644831180573
Validation loss = 0.15618892014026642
Validation loss = 0.14444254338741302
Validation loss = 0.13432830572128296
Validation loss = 0.12265247851610184
Validation loss = 0.11037160456180573
Validation loss = 0.12011883407831192
Validation loss = 0.0994982123374939
Validation loss = 0.09626778960227966
Validation loss = 0.09838944673538208
Validation loss = 0.09100102633237839
Validation loss = 0.08435767889022827
Validation loss = 0.085904061794281
Validation loss = 0.08120749145746231
Validation loss = 0.07806087285280228
Validation loss = 0.07274661213159561
Validation loss = 0.0766473188996315
Validation loss = 0.0733908861875534
Validation loss = 0.06775659322738647
Validation loss = 0.0706813633441925
Validation loss = 0.07922416180372238
Validation loss = 0.06875265389680862
Validation loss = 0.07362797856330872
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2357262820005417
Validation loss = 0.16356989741325378
Validation loss = 0.14094683527946472
Validation loss = 0.13018327951431274
Validation loss = 0.12194685637950897
Validation loss = 0.11794190853834152
Validation loss = 0.10892195999622345
Validation loss = 0.11820521950721741
Validation loss = 0.11076293885707855
Validation loss = 0.10034554451704025
Validation loss = 0.09902946650981903
Validation loss = 0.11046920716762543
Validation loss = 0.09053371846675873
Validation loss = 0.08832714706659317
Validation loss = 0.09636493772268295
Validation loss = 0.08846940845251083
Validation loss = 0.10560742765665054
Validation loss = 0.0870203748345375
Validation loss = 0.08722764253616333
Validation loss = 0.08282987028360367
Validation loss = 0.07845471054315567
Validation loss = 0.08488690853118896
Validation loss = 0.0825229361653328
Validation loss = 0.08740832656621933
Validation loss = 0.07408085465431213
Validation loss = 0.07447516173124313
Validation loss = 0.07197389006614685
Validation loss = 0.06996020674705505
Validation loss = 0.07094859331846237
Validation loss = 0.07558614015579224
Validation loss = 0.07015905529260635
Validation loss = 0.06541707366704941
Validation loss = 0.07347998023033142
Validation loss = 0.06475654244422913
Validation loss = 0.07788915187120438
Validation loss = 0.0667153149843216
Validation loss = 0.06373587995767593
Validation loss = 0.06705127656459808
Validation loss = 0.0655977725982666
Validation loss = 0.06278583407402039
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0949  |
| Iteration     | 1        |
| MaximumReturn | -0.0453  |
| MinimumReturn | -0.302   |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1229768767952919
Validation loss = 0.07835039496421814
Validation loss = 0.06936332583427429
Validation loss = 0.06294715404510498
Validation loss = 0.05785945802927017
Validation loss = 0.050198979675769806
Validation loss = 0.0526699461042881
Validation loss = 0.05140262469649315
Validation loss = 0.04252452030777931
Validation loss = 0.043699998408555984
Validation loss = 0.046480562537908554
Validation loss = 0.04305489361286163
Validation loss = 0.04856216907501221
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1338006854057312
Validation loss = 0.08766933530569077
Validation loss = 0.07845732569694519
Validation loss = 0.07150155305862427
Validation loss = 0.06319135427474976
Validation loss = 0.06013931706547737
Validation loss = 0.054683953523635864
Validation loss = 0.06385255604982376
Validation loss = 0.055314499884843826
Validation loss = 0.05561322718858719
Validation loss = 0.05988851562142372
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15445728600025177
Validation loss = 0.10427184402942657
Validation loss = 0.08501441031694412
Validation loss = 0.07310262322425842
Validation loss = 0.06829463690519333
Validation loss = 0.06374106556177139
Validation loss = 0.0595770925283432
Validation loss = 0.06092095747590065
Validation loss = 0.052823297679424286
Validation loss = 0.05442056059837341
Validation loss = 0.0525074377655983
Validation loss = 0.04295723885297775
Validation loss = 0.04492542892694473
Validation loss = 0.03971649706363678
Validation loss = 0.04129680246114731
Validation loss = 0.039773061871528625
Validation loss = 0.0378795862197876
Validation loss = 0.033513713628053665
Validation loss = 0.03867081180214882
Validation loss = 0.0384652316570282
Validation loss = 0.036614879965782166
Validation loss = 0.03495511785149574
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.13695217669010162
Validation loss = 0.08485197275876999
Validation loss = 0.06759178638458252
Validation loss = 0.05752900242805481
Validation loss = 0.056787021458148956
Validation loss = 0.053530845791101456
Validation loss = 0.05479324236512184
Validation loss = 0.05246717482805252
Validation loss = 0.04346681386232376
Validation loss = 0.04357072338461876
Validation loss = 0.04732412099838257
Validation loss = 0.04586738720536232
Validation loss = 0.03681166470050812
Validation loss = 0.038210608065128326
Validation loss = 0.04973266273736954
Validation loss = 0.04279864579439163
Validation loss = 0.04382973536849022
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.17077411711215973
Validation loss = 0.09551223367452621
Validation loss = 0.0810605138540268
Validation loss = 0.07122582942247391
Validation loss = 0.06540294736623764
Validation loss = 0.05816593021154404
Validation loss = 0.051307618618011475
Validation loss = 0.04653516411781311
Validation loss = 0.045064084231853485
Validation loss = 0.04224224388599396
Validation loss = 0.05028071999549866
Validation loss = 0.04659843072295189
Validation loss = 0.041390806436538696
Validation loss = 0.038570333272218704
Validation loss = 0.0351422093808651
Validation loss = 0.03830058127641678
Validation loss = 0.03300956264138222
Validation loss = 0.037920668721199036
Validation loss = 0.03319147974252701
Validation loss = 0.03743632510304451
Validation loss = 0.032417625188827515
Validation loss = 0.03247617557644844
Validation loss = 0.03156087547540665
Validation loss = 0.02663278952240944
Validation loss = 0.026785938069224358
Validation loss = 0.02613387256860733
Validation loss = 0.026164790615439415
Validation loss = 0.025854289531707764
Validation loss = 0.024646086618304253
Validation loss = 0.02543836645781994
Validation loss = 0.027880609035491943
Validation loss = 0.034826308488845825
Validation loss = 0.032016053795814514
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00901 |
| Iteration     | 2        |
| MaximumReturn | -0.0058  |
| MinimumReturn | -0.0138  |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05031164363026619
Validation loss = 0.03758062794804573
Validation loss = 0.03682897612452507
Validation loss = 0.03675832226872444
Validation loss = 0.03398110345005989
Validation loss = 0.029743678867816925
Validation loss = 0.028286350890994072
Validation loss = 0.03738181293010712
Validation loss = 0.026966065168380737
Validation loss = 0.03405653312802315
Validation loss = 0.02944868616759777
Validation loss = 0.032899439334869385
Validation loss = 0.025514857843518257
Validation loss = 0.02697725035250187
Validation loss = 0.028132006525993347
Validation loss = 0.027885640040040016
Validation loss = 0.022012310102581978
Validation loss = 0.027232393622398376
Validation loss = 0.030415480956435204
Validation loss = 0.023608321323990822
Validation loss = 0.024380220100283623
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.054777294397354126
Validation loss = 0.046820029616355896
Validation loss = 0.046809855848550797
Validation loss = 0.04003384709358215
Validation loss = 0.03746813163161278
Validation loss = 0.043745968490839005
Validation loss = 0.036076124757528305
Validation loss = 0.03575963154435158
Validation loss = 0.038289736956357956
Validation loss = 0.0342760793864727
Validation loss = 0.03661976009607315
Validation loss = 0.03428550064563751
Validation loss = 0.03399534523487091
Validation loss = 0.031121185049414635
Validation loss = 0.03439474478363991
Validation loss = 0.03389566019177437
Validation loss = 0.03727732226252556
Validation loss = 0.03353674337267876
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07027776539325714
Validation loss = 0.05642399564385414
Validation loss = 0.03705526515841484
Validation loss = 0.03306477144360542
Validation loss = 0.037892330437898636
Validation loss = 0.03549029305577278
Validation loss = 0.033160436898469925
Validation loss = 0.026037082076072693
Validation loss = 0.033277686685323715
Validation loss = 0.028362663462758064
Validation loss = 0.032999902963638306
Validation loss = 0.02816019207239151
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05638103559613228
Validation loss = 0.039444517344236374
Validation loss = 0.03274829313158989
Validation loss = 0.03240470588207245
Validation loss = 0.031800318509340286
Validation loss = 0.03065727837383747
Validation loss = 0.039956092834472656
Validation loss = 0.030694618821144104
Validation loss = 0.0333063043653965
Validation loss = 0.03120601736009121
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.053642917424440384
Validation loss = 0.04128202423453331
Validation loss = 0.038123298436403275
Validation loss = 0.03586313873529434
Validation loss = 0.030947251245379448
Validation loss = 0.028386464342474937
Validation loss = 0.02593444287776947
Validation loss = 0.024981850758194923
Validation loss = 0.025247568264603615
Validation loss = 0.02471107244491577
Validation loss = 0.027296015992760658
Validation loss = 0.02240750938653946
Validation loss = 0.023371553048491478
Validation loss = 0.02704182080924511
Validation loss = 0.0267854705452919
Validation loss = 0.02814052440226078
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00124  |
| Iteration     | 3         |
| MaximumReturn | -0.000952 |
| MinimumReturn | -0.00149  |
| TotalSamples  | 8330      |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05448124557733536
Validation loss = 0.042887669056653976
Validation loss = 0.03735923767089844
Validation loss = 0.026208309456706047
Validation loss = 0.024262282997369766
Validation loss = 0.025256110355257988
Validation loss = 0.02814939245581627
Validation loss = 0.02483394369482994
Validation loss = 0.025977876037359238
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04421660676598549
Validation loss = 0.031965211033821106
Validation loss = 0.030290616676211357
Validation loss = 0.02530711703002453
Validation loss = 0.02718639001250267
Validation loss = 0.04043523594737053
Validation loss = 0.026690155267715454
Validation loss = 0.02860938385128975
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.049771882593631744
Validation loss = 0.03219975531101227
Validation loss = 0.023832641541957855
Validation loss = 0.020598577335476875
Validation loss = 0.02445346862077713
Validation loss = 0.022336456924676895
Validation loss = 0.022566210478544235
Validation loss = 0.02064722776412964
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03330601006746292
Validation loss = 0.03778846189379692
Validation loss = 0.028448013588786125
Validation loss = 0.026612555608153343
Validation loss = 0.028924548998475075
Validation loss = 0.027839509770274162
Validation loss = 0.025932999327778816
Validation loss = 0.028962234035134315
Validation loss = 0.023433351889252663
Validation loss = 0.022998817265033722
Validation loss = 0.022261634469032288
Validation loss = 0.021182484924793243
Validation loss = 0.02324320748448372
Validation loss = 0.02708277478814125
Validation loss = 0.022325294092297554
Validation loss = 0.024456392973661423
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.041397787630558014
Validation loss = 0.023541707545518875
Validation loss = 0.021397698670625687
Validation loss = 0.02180391177535057
Validation loss = 0.023203982040286064
Validation loss = 0.02065241150557995
Validation loss = 0.02071639709174633
Validation loss = 0.021352985873818398
Validation loss = 0.01953795738518238
Validation loss = 0.030392087996006012
Validation loss = 0.020183442160487175
Validation loss = 0.01785322092473507
Validation loss = 0.025622781366109848
Validation loss = 0.02062043361365795
Validation loss = 0.017496533691883087
Validation loss = 0.016905371099710464
Validation loss = 0.01787910796701908
Validation loss = 0.01837816834449768
Validation loss = 0.021570343524217606
Validation loss = 0.01915736123919487
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00326 |
| Iteration     | 4        |
| MaximumReturn | -0.00245 |
| MinimumReturn | -0.00433 |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04055636376142502
Validation loss = 0.019174743443727493
Validation loss = 0.020998263731598854
Validation loss = 0.025738591328263283
Validation loss = 0.021122992038726807
Validation loss = 0.017098860815167427
Validation loss = 0.01782817207276821
Validation loss = 0.018605411052703857
Validation loss = 0.01961476542055607
Validation loss = 0.016437608748674393
Validation loss = 0.01951700821518898
Validation loss = 0.027810171246528625
Validation loss = 0.025704551488161087
Validation loss = 0.02491084486246109
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.029286345466971397
Validation loss = 0.023555997759103775
Validation loss = 0.020039748400449753
Validation loss = 0.019317621365189552
Validation loss = 0.017322782427072525
Validation loss = 0.02066594734787941
Validation loss = 0.01947972923517227
Validation loss = 0.019633501768112183
Validation loss = 0.020239045843482018
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.033701274544000626
Validation loss = 0.026288580149412155
Validation loss = 0.0201176218688488
Validation loss = 0.017555702477693558
Validation loss = 0.017336785793304443
Validation loss = 0.022093720734119415
Validation loss = 0.02450995147228241
Validation loss = 0.02126065455377102
Validation loss = 0.015792205929756165
Validation loss = 0.015520073473453522
Validation loss = 0.017082642763853073
Validation loss = 0.01782533898949623
Validation loss = 0.01897500269114971
Validation loss = 0.016982661560177803
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026779508218169212
Validation loss = 0.018021846190094948
Validation loss = 0.025026272982358932
Validation loss = 0.021268947049975395
Validation loss = 0.02111211232841015
Validation loss = 0.02140229195356369
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04218355193734169
Validation loss = 0.028125857934355736
Validation loss = 0.02631603740155697
Validation loss = 0.018968326970934868
Validation loss = 0.023064302280545235
Validation loss = 0.015197010710835457
Validation loss = 0.021488603204488754
Validation loss = 0.017430968582630157
Validation loss = 0.016677197068929672
Validation loss = 0.014407822862267494
Validation loss = 0.014590514823794365
Validation loss = 0.013594242744147778
Validation loss = 0.014870263636112213
Validation loss = 0.013981240801513195
Validation loss = 0.015477396547794342
Validation loss = 0.015473559498786926
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0101  |
| Iteration     | 5        |
| MaximumReturn | -0.00666 |
| MinimumReturn | -0.0131  |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02864505723118782
Validation loss = 0.01699449121952057
Validation loss = 0.013940001837909222
Validation loss = 0.014395194128155708
Validation loss = 0.015082207508385181
Validation loss = 0.01329895295202732
Validation loss = 0.013881072402000427
Validation loss = 0.016697268933057785
Validation loss = 0.014758706092834473
Validation loss = 0.013965671882033348
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.033782705664634705
Validation loss = 0.0201034527271986
Validation loss = 0.019662750884890556
Validation loss = 0.018448593094944954
Validation loss = 0.018336523324251175
Validation loss = 0.02038491889834404
Validation loss = 0.020459558814764023
Validation loss = 0.018248330801725388
Validation loss = 0.017293376848101616
Validation loss = 0.01589532196521759
Validation loss = 0.017479559406638145
Validation loss = 0.018582817167043686
Validation loss = 0.014398780651390553
Validation loss = 0.016145270317792892
Validation loss = 0.014038617722690105
Validation loss = 0.020234769210219383
Validation loss = 0.01715993881225586
Validation loss = 0.014794113114476204
Validation loss = 0.014661567285656929
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02373051829636097
Validation loss = 0.017865248024463654
Validation loss = 0.015226788818836212
Validation loss = 0.01967940293252468
Validation loss = 0.01730300858616829
Validation loss = 0.013671095483005047
Validation loss = 0.015676582232117653
Validation loss = 0.014783789403736591
Validation loss = 0.0145515576004982
Validation loss = 0.016469011083245277
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.029398027807474136
Validation loss = 0.018557079136371613
Validation loss = 0.014109134674072266
Validation loss = 0.015956301242113113
Validation loss = 0.015465164557099342
Validation loss = 0.017890462651848793
Validation loss = 0.015737462788820267
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.036995816975831985
Validation loss = 0.018072539940476418
Validation loss = 0.01495195459574461
Validation loss = 0.019689880311489105
Validation loss = 0.014616522006690502
Validation loss = 0.01587648317217827
Validation loss = 0.013846213929355145
Validation loss = 0.013310415670275688
Validation loss = 0.01421522069722414
Validation loss = 0.014349473640322685
Validation loss = 0.014792313799262047
Validation loss = 0.019815288484096527
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00111  |
| Iteration     | 6         |
| MaximumReturn | -0.000816 |
| MinimumReturn | -0.00144  |
| TotalSamples  | 13328     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020128848031163216
Validation loss = 0.014849436469376087
Validation loss = 0.015883879736065865
Validation loss = 0.012087531387805939
Validation loss = 0.017325757071375847
Validation loss = 0.016560299322009087
Validation loss = 0.016787897795438766
Validation loss = 0.014899869449436665
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02822364680469036
Validation loss = 0.014407902024686337
Validation loss = 0.01769755594432354
Validation loss = 0.018597440794110298
Validation loss = 0.017311392351984978
Validation loss = 0.020114120095968246
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0192080270498991
Validation loss = 0.01835491508245468
Validation loss = 0.017151622101664543
Validation loss = 0.013976060785353184
Validation loss = 0.014130961149930954
Validation loss = 0.014557256363332272
Validation loss = 0.019158123061060905
Validation loss = 0.013414965011179447
Validation loss = 0.012519539333879948
Validation loss = 0.016724763438105583
Validation loss = 0.015103652141988277
Validation loss = 0.013737756758928299
Validation loss = 0.015277785249054432
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02607632987201214
Validation loss = 0.014131028205156326
Validation loss = 0.014000649563968182
Validation loss = 0.01700558327138424
Validation loss = 0.015050016343593597
Validation loss = 0.01625446230173111
Validation loss = 0.01371715497225523
Validation loss = 0.018495047464966774
Validation loss = 0.012995353899896145
Validation loss = 0.01573331467807293
Validation loss = 0.011989601887762547
Validation loss = 0.016595792025327682
Validation loss = 0.01807386241853237
Validation loss = 0.015999967232346535
Validation loss = 0.012407954782247543
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01619373820722103
Validation loss = 0.013850155286490917
Validation loss = 0.01591484062373638
Validation loss = 0.012327596545219421
Validation loss = 0.01274123415350914
Validation loss = 0.013364573009312153
Validation loss = 0.013317923061549664
Validation loss = 0.015250411815941334
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.117   |
| Iteration     | 7        |
| MaximumReturn | -0.0181  |
| MinimumReturn | -1.18    |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020371755585074425
Validation loss = 0.014674106612801552
Validation loss = 0.025961855426430702
Validation loss = 0.018475404009222984
Validation loss = 0.013807048089802265
Validation loss = 0.012617265805602074
Validation loss = 0.014132840558886528
Validation loss = 0.01126041729003191
Validation loss = 0.015112130902707577
Validation loss = 0.012453392148017883
Validation loss = 0.012606249190866947
Validation loss = 0.015189777128398418
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016852719709277153
Validation loss = 0.01842978596687317
Validation loss = 0.014953358098864555
Validation loss = 0.01680835150182247
Validation loss = 0.015135310590267181
Validation loss = 0.01248760987073183
Validation loss = 0.013327651657164097
Validation loss = 0.015268092043697834
Validation loss = 0.014914224855601788
Validation loss = 0.012851370498538017
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014079773798584938
Validation loss = 0.03302692994475365
Validation loss = 0.02562551200389862
Validation loss = 0.019651126116514206
Validation loss = 0.015530617907643318
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02293509431183338
Validation loss = 0.01689782552421093
Validation loss = 0.014979591593146324
Validation loss = 0.0136799905449152
Validation loss = 0.017369667068123817
Validation loss = 0.016990043222904205
Validation loss = 0.014188212342560291
Validation loss = 0.014543334022164345
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020376166328787804
Validation loss = 0.01170345488935709
Validation loss = 0.01238259021192789
Validation loss = 0.012687621638178825
Validation loss = 0.014174158684909344
Validation loss = 0.01136584673076868
Validation loss = 0.012813339941203594
Validation loss = 0.014417772181332111
Validation loss = 0.017491791397333145
Validation loss = 0.01579659804701805
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.018   |
| Iteration     | 8        |
| MaximumReturn | -0.0123  |
| MinimumReturn | -0.0288  |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013577093370258808
Validation loss = 0.014934023842215538
Validation loss = 0.011279664933681488
Validation loss = 0.009243708103895187
Validation loss = 0.013049264438450336
Validation loss = 0.012834650464355946
Validation loss = 0.009254861623048782
Validation loss = 0.008615405298769474
Validation loss = 0.00918117817491293
Validation loss = 0.009717591106891632
Validation loss = 0.009844042360782623
Validation loss = 0.010568272322416306
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013089442625641823
Validation loss = 0.0103080403059721
Validation loss = 0.010064488276839256
Validation loss = 0.00879276916384697
Validation loss = 0.01075354591012001
Validation loss = 0.014354890212416649
Validation loss = 0.00950432475656271
Validation loss = 0.010036535561084747
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017743702977895737
Validation loss = 0.011079695075750351
Validation loss = 0.010906226933002472
Validation loss = 0.013553100638091564
Validation loss = 0.012123731896281242
Validation loss = 0.011086249724030495
Validation loss = 0.010592303238809109
Validation loss = 0.01115952618420124
Validation loss = 0.011434445157647133
Validation loss = 0.009851831011474133
Validation loss = 0.011653497815132141
Validation loss = 0.00974469818174839
Validation loss = 0.009006223641335964
Validation loss = 0.009636114351451397
Validation loss = 0.010765819810330868
Validation loss = 0.009052140638232231
Validation loss = 0.008236996829509735
Validation loss = 0.009528330527245998
Validation loss = 0.00915525108575821
Validation loss = 0.011113571003079414
Validation loss = 0.010971690528094769
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013432690873742104
Validation loss = 0.010243421420454979
Validation loss = 0.010621931403875351
Validation loss = 0.01006879098713398
Validation loss = 0.015049065463244915
Validation loss = 0.01115959882736206
Validation loss = 0.009511973708868027
Validation loss = 0.020563114434480667
Validation loss = 0.01137414202094078
Validation loss = 0.00903739221394062
Validation loss = 0.011281237006187439
Validation loss = 0.009334592148661613
Validation loss = 0.010163910686969757
Validation loss = 0.01147482730448246
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012696053832769394
Validation loss = 0.01224574539810419
Validation loss = 0.009921278804540634
Validation loss = 0.008287763223052025
Validation loss = 0.007840732112526894
Validation loss = 0.008794511668384075
Validation loss = 0.008444472216069698
Validation loss = 0.010754586197435856
Validation loss = 0.009328142739832401
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0196  |
| Iteration     | 9        |
| MaximumReturn | -0.0125  |
| MinimumReturn | -0.0295  |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011201697401702404
Validation loss = 0.00777070876210928
Validation loss = 0.006456306204199791
Validation loss = 0.009458940476179123
Validation loss = 0.008335348218679428
Validation loss = 0.007083427626639605
Validation loss = 0.008246871642768383
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01102357730269432
Validation loss = 0.009767469018697739
Validation loss = 0.008050376549363136
Validation loss = 0.008579134941101074
Validation loss = 0.011287407949566841
Validation loss = 0.007662637159228325
Validation loss = 0.009386299178004265
Validation loss = 0.008155649527907372
Validation loss = 0.008919759653508663
Validation loss = 0.01094529777765274
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012841169722378254
Validation loss = 0.006816073786467314
Validation loss = 0.007620964664965868
Validation loss = 0.009300027042627335
Validation loss = 0.008780788630247116
Validation loss = 0.009545440785586834
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008046429604291916
Validation loss = 0.006776047870516777
Validation loss = 0.007660665549337864
Validation loss = 0.0080728093162179
Validation loss = 0.0076589095406234264
Validation loss = 0.006761196535080671
Validation loss = 0.007604539394378662
Validation loss = 0.00625956803560257
Validation loss = 0.006279677618294954
Validation loss = 0.008526093326508999
Validation loss = 0.011715559288859367
Validation loss = 0.011207984760403633
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008586656302213669
Validation loss = 0.006619509309530258
Validation loss = 0.008100408129394054
Validation loss = 0.006506042089313269
Validation loss = 0.006571728736162186
Validation loss = 0.006399830803275108
Validation loss = 0.007943687960505486
Validation loss = 0.009953636676073074
Validation loss = 0.009346901439130306
Validation loss = 0.006702220533043146
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.21    |
| Iteration     | 10       |
| MaximumReturn | -0.0012  |
| MinimumReturn | -22.9    |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01934691146016121
Validation loss = 0.006876328028738499
Validation loss = 0.0053117284551262856
Validation loss = 0.005520103499293327
Validation loss = 0.006227404810488224
Validation loss = 0.006242621224373579
Validation loss = 0.0065349191427230835
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0201321579515934
Validation loss = 0.007234533317387104
Validation loss = 0.006536389235407114
Validation loss = 0.005823354236781597
Validation loss = 0.006648059003055096
Validation loss = 0.005920705385506153
Validation loss = 0.004956056363880634
Validation loss = 0.005036152433604002
Validation loss = 0.006271059159189463
Validation loss = 0.007001371122896671
Validation loss = 0.006822098046541214
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01830809935927391
Validation loss = 0.00723536079749465
Validation loss = 0.00755652692168951
Validation loss = 0.006687236484140158
Validation loss = 0.006005010101944208
Validation loss = 0.005550925619900227
Validation loss = 0.005860643927007914
Validation loss = 0.0057335468009114265
Validation loss = 0.007798689417541027
Validation loss = 0.009769510477781296
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020242173224687576
Validation loss = 0.00949997827410698
Validation loss = 0.00683352118358016
Validation loss = 0.005786396563053131
Validation loss = 0.005583415739238262
Validation loss = 0.005014626309275627
Validation loss = 0.00528207141906023
Validation loss = 0.006293622311204672
Validation loss = 0.004892890341579914
Validation loss = 0.0054054344072937965
Validation loss = 0.005351250059902668
Validation loss = 0.00561542296782136
Validation loss = 0.005320873111486435
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013557478785514832
Validation loss = 0.007940483279526234
Validation loss = 0.009746878407895565
Validation loss = 0.005856575910001993
Validation loss = 0.004846097901463509
Validation loss = 0.005403476767241955
Validation loss = 0.007302281446754932
Validation loss = 0.004780655261129141
Validation loss = 0.0062591517344117165
Validation loss = 0.005572839640080929
Validation loss = 0.006301282439380884
Validation loss = 0.005172292701900005
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.1    |
| Iteration     | 11       |
| MaximumReturn | -0.057   |
| MinimumReturn | -51      |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009361112490296364
Validation loss = 0.004579432774335146
Validation loss = 0.0035579558461904526
Validation loss = 0.003755571786314249
Validation loss = 0.0032949477899819613
Validation loss = 0.004765239078551531
Validation loss = 0.004851739387959242
Validation loss = 0.003286790568381548
Validation loss = 0.003243927378207445
Validation loss = 0.0037951450794935226
Validation loss = 0.004186960868537426
Validation loss = 0.003858385141938925
Validation loss = 0.0044719562865793705
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012553605251014233
Validation loss = 0.0037867992650717497
Validation loss = 0.003985225688666105
Validation loss = 0.005104796029627323
Validation loss = 0.003613597247749567
Validation loss = 0.004629922565072775
Validation loss = 0.00470236549153924
Validation loss = 0.003963532391935587
Validation loss = 0.003994398284703493
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011962642893195152
Validation loss = 0.0049985432997345924
Validation loss = 0.003487552050501108
Validation loss = 0.004981447942554951
Validation loss = 0.0037241920363157988
Validation loss = 0.0038781017065048218
Validation loss = 0.0036274283193051815
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007940210402011871
Validation loss = 0.005136897787451744
Validation loss = 0.0037185843102633953
Validation loss = 0.003835336770862341
Validation loss = 0.004816312808543444
Validation loss = 0.003731099423021078
Validation loss = 0.0038525029085576534
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009047443978488445
Validation loss = 0.004719061776995659
Validation loss = 0.003812116105109453
Validation loss = 0.005690684076398611
Validation loss = 0.0031085670925676823
Validation loss = 0.003229971509426832
Validation loss = 0.0037593343295156956
Validation loss = 0.0028683338314294815
Validation loss = 0.002982799429446459
Validation loss = 0.0033173435367643833
Validation loss = 0.003576096147298813
Validation loss = 0.003928365185856819
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.7    |
| Iteration     | 12       |
| MaximumReturn | -0.00125 |
| MinimumReturn | -59.7    |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004503756761550903
Validation loss = 0.004018679726868868
Validation loss = 0.00410407455638051
Validation loss = 0.0029543989803642035
Validation loss = 0.004668432287871838
Validation loss = 0.0035485841799527407
Validation loss = 0.003981475718319416
Validation loss = 0.00426365714520216
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005288750398904085
Validation loss = 0.004862931091338396
Validation loss = 0.004319095518440008
Validation loss = 0.0041698310524225235
Validation loss = 0.003876717993989587
Validation loss = 0.004689270630478859
Validation loss = 0.0049102515913546085
Validation loss = 0.005209405440837145
Validation loss = 0.004156660754233599
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006766017992049456
Validation loss = 0.0046472251415252686
Validation loss = 0.005720482673496008
Validation loss = 0.0038698180578649044
Validation loss = 0.004733396228402853
Validation loss = 0.0037814860697835684
Validation loss = 0.004134991206228733
Validation loss = 0.00494223041459918
Validation loss = 0.004943361505866051
Validation loss = 0.004279755521565676
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007346585392951965
Validation loss = 0.003947345539927483
Validation loss = 0.0030182322952896357
Validation loss = 0.0037727428134530783
Validation loss = 0.00332601903937757
Validation loss = 0.0052244956605136395
Validation loss = 0.004305472131818533
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0041832164861261845
Validation loss = 0.004199428483843803
Validation loss = 0.0034058622550219297
Validation loss = 0.0035179026890546083
Validation loss = 0.0038222097791731358
Validation loss = 0.003942007664591074
Validation loss = 0.0037304481957107782
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -5.14     |
| Iteration     | 13        |
| MaximumReturn | -0.000707 |
| MinimumReturn | -56.2     |
| TotalSamples  | 24990     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004843610338866711
Validation loss = 0.004351520910859108
Validation loss = 0.004094614181667566
Validation loss = 0.0038464923854917288
Validation loss = 0.0035037687048316
Validation loss = 0.004662510938942432
Validation loss = 0.005216494668275118
Validation loss = 0.004181525204330683
Validation loss = 0.0049211918376386166
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004662096966058016
Validation loss = 0.006367439404129982
Validation loss = 0.005329294595867395
Validation loss = 0.004659419413655996
Validation loss = 0.0045034559443593025
Validation loss = 0.0038386571686714888
Validation loss = 0.00603053905069828
Validation loss = 0.004353534895926714
Validation loss = 0.006246138829737902
Validation loss = 0.005035973619669676
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006922882050275803
Validation loss = 0.0037282651755958796
Validation loss = 0.004837718326598406
Validation loss = 0.003797548823058605
Validation loss = 0.004310511983931065
Validation loss = 0.004698296543210745
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00382430013269186
Validation loss = 0.005199845414608717
Validation loss = 0.004252741113305092
Validation loss = 0.004668032750487328
Validation loss = 0.004217609763145447
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003884875914081931
Validation loss = 0.003762962995097041
Validation loss = 0.0039273882284760475
Validation loss = 0.003620385890826583
Validation loss = 0.006433745380491018
Validation loss = 0.006668627727776766
Validation loss = 0.002976907417178154
Validation loss = 0.0033566656056791544
Validation loss = 0.0030836444348096848
Validation loss = 0.0038953323382884264
Validation loss = 0.007861542515456676
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -104     |
| Iteration     | 14       |
| MaximumReturn | -15.9    |
| MinimumReturn | -141     |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015574615448713303
Validation loss = 0.003362721297889948
Validation loss = 0.0037257990334182978
Validation loss = 0.0027192237321287394
Validation loss = 0.0029088130686432123
Validation loss = 0.0026940314564853907
Validation loss = 0.0022973341401666403
Validation loss = 0.0030155538115650415
Validation loss = 0.0026151526253670454
Validation loss = 0.0031770875211805105
Validation loss = 0.0019205384887754917
Validation loss = 0.0035321239847689867
Validation loss = 0.0025934921577572823
Validation loss = 0.002828921191394329
Validation loss = 0.0032335049472749233
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021723797544836998
Validation loss = 0.005950805265456438
Validation loss = 0.004257118329405785
Validation loss = 0.002764481818303466
Validation loss = 0.003570107975974679
Validation loss = 0.002860430860891938
Validation loss = 0.002769303973764181
Validation loss = 0.0024690856225788593
Validation loss = 0.004779020324349403
Validation loss = 0.003105050651356578
Validation loss = 0.0023601672146469355
Validation loss = 0.004741402808576822
Validation loss = 0.003081814618781209
Validation loss = 0.0029214497189968824
Validation loss = 0.0038580582477152348
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0216564629226923
Validation loss = 0.009831145405769348
Validation loss = 0.002784301992505789
Validation loss = 0.003017091890797019
Validation loss = 0.003413487458601594
Validation loss = 0.0026757766027003527
Validation loss = 0.002656683325767517
Validation loss = 0.0022746126633137465
Validation loss = 0.00240937527269125
Validation loss = 0.0029863049276173115
Validation loss = 0.0023676047567278147
Validation loss = 0.002243990544229746
Validation loss = 0.0033706389367580414
Validation loss = 0.00423127505928278
Validation loss = 0.004847485106438398
Validation loss = 0.004260944202542305
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015129719860851765
Validation loss = 0.004203100688755512
Validation loss = 0.003236190415918827
Validation loss = 0.004010154400020838
Validation loss = 0.0029089050367474556
Validation loss = 0.0027692364528775215
Validation loss = 0.005298791453242302
Validation loss = 0.0035647894255816936
Validation loss = 0.003150837030261755
Validation loss = 0.002428404986858368
Validation loss = 0.002304688561707735
Validation loss = 0.0028288657777011395
Validation loss = 0.0023741619661450386
Validation loss = 0.0019970745779573917
Validation loss = 0.003768655238673091
Validation loss = 0.003031000727787614
Validation loss = 0.003186221932992339
Validation loss = 0.002284297486767173
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017885228618979454
Validation loss = 0.00415025744587183
Validation loss = 0.004346220754086971
Validation loss = 0.002383317332714796
Validation loss = 0.002370750531554222
Validation loss = 0.0026467617135494947
Validation loss = 0.0023476197384297848
Validation loss = 0.0027192654088139534
Validation loss = 0.004190905950963497
Validation loss = 0.002125523053109646
Validation loss = 0.0029392424039542675
Validation loss = 0.003607607213780284
Validation loss = 0.0025301678106188774
Validation loss = 0.002161114476621151
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -116     |
| Iteration     | 15       |
| MaximumReturn | -48.3    |
| MinimumReturn | -139     |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01946122571825981
Validation loss = 0.00544318463653326
Validation loss = 0.002524197567254305
Validation loss = 0.0027146637439727783
Validation loss = 0.002811591839417815
Validation loss = 0.002762316958978772
Validation loss = 0.0030187976080924273
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015604068525135517
Validation loss = 0.0026299930177628994
Validation loss = 0.002865353599190712
Validation loss = 0.002426596125587821
Validation loss = 0.002354035619646311
Validation loss = 0.002295250305905938
Validation loss = 0.002432821551337838
Validation loss = 0.0027338259387761354
Validation loss = 0.002788264537230134
Validation loss = 0.003177039558067918
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014219095930457115
Validation loss = 0.0028418605215847492
Validation loss = 0.00194319908041507
Validation loss = 0.002215283690020442
Validation loss = 0.0020873306784778833
Validation loss = 0.0026115751825273037
Validation loss = 0.0037123411893844604
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01696387678384781
Validation loss = 0.004387342371046543
Validation loss = 0.0035714861005544662
Validation loss = 0.0027306117117404938
Validation loss = 0.0023957830853760242
Validation loss = 0.002229388104751706
Validation loss = 0.0029534155037254095
Validation loss = 0.0027273655869066715
Validation loss = 0.002732818713411689
Validation loss = 0.0022512890864163637
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010717612691223621
Validation loss = 0.00285349995829165
Validation loss = 0.00270374515093863
Validation loss = 0.0034059465397149324
Validation loss = 0.0020800374913960695
Validation loss = 0.003209539456292987
Validation loss = 0.0018633397994562984
Validation loss = 0.0020677419379353523
Validation loss = 0.0025119073688983917
Validation loss = 0.0026175680104643106
Validation loss = 0.0023340864572674036
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -84.9    |
| Iteration     | 16       |
| MaximumReturn | -0.129   |
| MinimumReturn | -147     |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0031672115437686443
Validation loss = 0.002008797600865364
Validation loss = 0.0016873267013579607
Validation loss = 0.0017886161804199219
Validation loss = 0.0018407984171062708
Validation loss = 0.004529320169240236
Validation loss = 0.0014746342785656452
Validation loss = 0.0018059912836179137
Validation loss = 0.00171043595764786
Validation loss = 0.0016897836467251182
Validation loss = 0.0017780049238353968
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0026513917837291956
Validation loss = 0.0024332732427865267
Validation loss = 0.0019229892641305923
Validation loss = 0.00238016783259809
Validation loss = 0.003009714186191559
Validation loss = 0.0027231252752244473
Validation loss = 0.0018064524047076702
Validation loss = 0.0016294352244585752
Validation loss = 0.0020406008698046207
Validation loss = 0.0025731734931468964
Validation loss = 0.0025340383872389793
Validation loss = 0.0015954463742673397
Validation loss = 0.002088742796331644
Validation loss = 0.0018858042312785983
Validation loss = 0.0017020221566781402
Validation loss = 0.0020172742661088705
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00623492943122983
Validation loss = 0.0021325130946934223
Validation loss = 0.0016083315713331103
Validation loss = 0.0034234249033033848
Validation loss = 0.0026157614775002003
Validation loss = 0.002000903245061636
Validation loss = 0.0020255744457244873
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0040648807771503925
Validation loss = 0.004812175873667002
Validation loss = 0.0016163198743015528
Validation loss = 0.0032160268165171146
Validation loss = 0.00233479798771441
Validation loss = 0.0015653384616598487
Validation loss = 0.002424870152026415
Validation loss = 0.0019899241160601377
Validation loss = 0.0018396422965452075
Validation loss = 0.0030513950623571873
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002914086217060685
Validation loss = 0.0016427424270659685
Validation loss = 0.002055390039458871
Validation loss = 0.001725128386169672
Validation loss = 0.0017922037513926625
Validation loss = 0.0019542064983397722
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -24.6    |
| Iteration     | 17       |
| MaximumReturn | -0.00135 |
| MinimumReturn | -150     |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0023708909284323454
Validation loss = 0.0017158450791612267
Validation loss = 0.0023238526191562414
Validation loss = 0.0023922608233988285
Validation loss = 0.0020292699337005615
Validation loss = 0.002907200949266553
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027182043995708227
Validation loss = 0.0028224887792021036
Validation loss = 0.0029164417646825314
Validation loss = 0.002441966673359275
Validation loss = 0.0018690363503992558
Validation loss = 0.001996804727241397
Validation loss = 0.002215663203969598
Validation loss = 0.002137943170964718
Validation loss = 0.0020760935731232166
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019835939165204763
Validation loss = 0.0046364860609173775
Validation loss = 0.0023337865713983774
Validation loss = 0.002418421907350421
Validation loss = 0.002080474980175495
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0023709149099886417
Validation loss = 0.0020138160325586796
Validation loss = 0.0019734587986022234
Validation loss = 0.001993106910958886
Validation loss = 0.0022706659510731697
Validation loss = 0.0019287921022623777
Validation loss = 0.0020567162428051233
Validation loss = 0.0016290268395096064
Validation loss = 0.0016424639616161585
Validation loss = 0.00172647915314883
Validation loss = 0.0022960545029491186
Validation loss = 0.0019484407966956496
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019585629925131798
Validation loss = 0.0016269629122689366
Validation loss = 0.0019140839576721191
Validation loss = 0.0023721829056739807
Validation loss = 0.0018090468365699053
Validation loss = 0.0022337029222398996
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -128     |
| Iteration     | 18       |
| MaximumReturn | -81.1    |
| MinimumReturn | -159     |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006291994359344244
Validation loss = 0.002004295354709029
Validation loss = 0.0021261684596538544
Validation loss = 0.001564327976666391
Validation loss = 0.0014714684803038836
Validation loss = 0.0016335303662344813
Validation loss = 0.0021187840029597282
Validation loss = 0.001584740704856813
Validation loss = 0.0015910593792796135
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011906900443136692
Validation loss = 0.002734092529863119
Validation loss = 0.0016353134997189045
Validation loss = 0.001549221808090806
Validation loss = 0.0017422474920749664
Validation loss = 0.002305604750290513
Validation loss = 0.0015813157660886645
Validation loss = 0.0016880983021110296
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012633384205400944
Validation loss = 0.002075975062325597
Validation loss = 0.0022240187972784042
Validation loss = 0.002038715174421668
Validation loss = 0.001857731956988573
Validation loss = 0.0018077383283525705
Validation loss = 0.0018381392583251
Validation loss = 0.0015861119609326124
Validation loss = 0.0024592431727796793
Validation loss = 0.0025613061152398586
Validation loss = 0.0017124427249655128
Validation loss = 0.0015464778989553452
Validation loss = 0.001957323867827654
Validation loss = 0.0014602197334170341
Validation loss = 0.0014613686362281442
Validation loss = 0.001809029607102275
Validation loss = 0.0021526371128857136
Validation loss = 0.0018840092234313488
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007303269580006599
Validation loss = 0.0022430249955505133
Validation loss = 0.002018624683842063
Validation loss = 0.0016573232132941484
Validation loss = 0.001665960531681776
Validation loss = 0.001358355162665248
Validation loss = 0.0016028766985982656
Validation loss = 0.0018327308353036642
Validation loss = 0.00192737253382802
Validation loss = 0.0027572819963097572
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008473087102174759
Validation loss = 0.001620647730305791
Validation loss = 0.0019013724522665143
Validation loss = 0.0016000901814550161
Validation loss = 0.0022132876329123974
Validation loss = 0.0018496724078431726
Validation loss = 0.0015122746117413044
Validation loss = 0.0016492812428623438
Validation loss = 0.00170460797380656
Validation loss = 0.0014422808308154345
Validation loss = 0.0015244497917592525
Validation loss = 0.00147822848521173
Validation loss = 0.0018827005987986922
Validation loss = 0.001489346264861524
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -73.7    |
| Iteration     | 19       |
| MaximumReturn | -29.5    |
| MinimumReturn | -111     |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025419092271476984
Validation loss = 0.0025115052703768015
Validation loss = 0.0014509588945657015
Validation loss = 0.001316069276072085
Validation loss = 0.001814184826798737
Validation loss = 0.0016325295437127352
Validation loss = 0.0028148849960416555
Validation loss = 0.001504797488451004
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027563711628317833
Validation loss = 0.0013954497408121824
Validation loss = 0.0013896813616156578
Validation loss = 0.0015623238869011402
Validation loss = 0.0014109358889982104
Validation loss = 0.0013573484029620886
Validation loss = 0.0016914821462705731
Validation loss = 0.0015660494100302458
Validation loss = 0.001983674941584468
Validation loss = 0.0013978517381474376
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003598627867177129
Validation loss = 0.0023691842798143625
Validation loss = 0.0014075699727982283
Validation loss = 0.0018107789801433682
Validation loss = 0.001442592591047287
Validation loss = 0.002062347484752536
Validation loss = 0.002010495401918888
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004284899681806564
Validation loss = 0.0016615231288596988
Validation loss = 0.0015834937803447247
Validation loss = 0.001387674594298005
Validation loss = 0.001833416405133903
Validation loss = 0.0015741264214739203
Validation loss = 0.001991415163502097
Validation loss = 0.0016568397404626012
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025657934602349997
Validation loss = 0.0019465649966150522
Validation loss = 0.002220714930444956
Validation loss = 0.0024439552798867226
Validation loss = 0.0025037049781531096
Validation loss = 0.001660729176364839
Validation loss = 0.0013270198833197355
Validation loss = 0.0018217506585642695
Validation loss = 0.0012490323279052973
Validation loss = 0.002168449340388179
Validation loss = 0.001975193852558732
Validation loss = 0.0012889632489532232
Validation loss = 0.001155548612587154
Validation loss = 0.0014628816861659288
Validation loss = 0.0017722827615216374
Validation loss = 0.0015506347408518195
Validation loss = 0.0013836079742759466
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -21.4    |
| Iteration     | 20       |
| MaximumReturn | -0.00119 |
| MinimumReturn | -128     |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005487557966262102
Validation loss = 0.002167421393096447
Validation loss = 0.0021976102143526077
Validation loss = 0.0014829098945483565
Validation loss = 0.0017873141914606094
Validation loss = 0.0014817966148257256
Validation loss = 0.0017965477891266346
Validation loss = 0.0015409187180921435
Validation loss = 0.0015601656632497907
Validation loss = 0.0021004306618124247
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0031943856738507748
Validation loss = 0.0021163583733141422
Validation loss = 0.00204762676730752
Validation loss = 0.001291262567974627
Validation loss = 0.003129885531961918
Validation loss = 0.002201510826125741
Validation loss = 0.0020611691288650036
Validation loss = 0.002281957771629095
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0054588341154158115
Validation loss = 0.0021750363521277905
Validation loss = 0.002026669215410948
Validation loss = 0.002414046786725521
Validation loss = 0.001830970635637641
Validation loss = 0.0016684005968272686
Validation loss = 0.0018676606705412269
Validation loss = 0.0017233932157978415
Validation loss = 0.0016402341425418854
Validation loss = 0.0017945027211681008
Validation loss = 0.001847565989010036
Validation loss = 0.0015125446952879429
Validation loss = 0.0014027927536517382
Validation loss = 0.0020692781545221806
Validation loss = 0.0022870656102895737
Validation loss = 0.0016201459802687168
Validation loss = 0.001284679863601923
Validation loss = 0.0015180018963292241
Validation loss = 0.0016181960236281157
Validation loss = 0.0015129816019907594
Validation loss = 0.001758161699399352
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003519180230796337
Validation loss = 0.0026324528735131025
Validation loss = 0.0021190601401031017
Validation loss = 0.001956840045750141
Validation loss = 0.0017432129243388772
Validation loss = 0.0014786936808377504
Validation loss = 0.0023428630083799362
Validation loss = 0.0014918884262442589
Validation loss = 0.0018385429866611958
Validation loss = 0.0016998464707285166
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003478695871308446
Validation loss = 0.0019612805917859077
Validation loss = 0.001807425171136856
Validation loss = 0.001757133286446333
Validation loss = 0.0014108617324382067
Validation loss = 0.0017550582997500896
Validation loss = 0.0020211716182529926
Validation loss = 0.0017717917216941714
Validation loss = 0.0017980469856411219
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -118     |
| Iteration     | 21       |
| MaximumReturn | -84      |
| MinimumReturn | -152     |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008140324614942074
Validation loss = 0.0024137471336871386
Validation loss = 0.0020396073814481497
Validation loss = 0.0014547863975167274
Validation loss = 0.0014874122571200132
Validation loss = 0.002542023081332445
Validation loss = 0.001366908079944551
Validation loss = 0.0018054759129881859
Validation loss = 0.0020484793931245804
Validation loss = 0.0025714405346661806
Validation loss = 0.0018314605113118887
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007405528798699379
Validation loss = 0.001785237924195826
Validation loss = 0.0016041497001424432
Validation loss = 0.0017143364530056715
Validation loss = 0.0016466014785692096
Validation loss = 0.001856564311310649
Validation loss = 0.0021020786371082067
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00469717662781477
Validation loss = 0.0014747329987585545
Validation loss = 0.002120601013302803
Validation loss = 0.0012735238997265697
Validation loss = 0.0010707257315516472
Validation loss = 0.0013977690832689404
Validation loss = 0.0012921039015054703
Validation loss = 0.0014525592559948564
Validation loss = 0.0016395265702158213
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005065321922302246
Validation loss = 0.00213036616332829
Validation loss = 0.0017166308825835586
Validation loss = 0.0018005829770117998
Validation loss = 0.0014880482340231538
Validation loss = 0.0016219487879425287
Validation loss = 0.00186790875159204
Validation loss = 0.0015405926387757063
Validation loss = 0.002153029665350914
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004799903370440006
Validation loss = 0.0023091549519449472
Validation loss = 0.0023049572482705116
Validation loss = 0.0017482765251770616
Validation loss = 0.0014216159470379353
Validation loss = 0.0018103703623637557
Validation loss = 0.0019059327896684408
Validation loss = 0.002185281366109848
Validation loss = 0.0013611603062599897
Validation loss = 0.002160395961254835
Validation loss = 0.001656783395446837
Validation loss = 0.0017945365980267525
Validation loss = 0.0013809045776724815
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -77.8    |
| Iteration     | 22       |
| MaximumReturn | -0.254   |
| MinimumReturn | -135     |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004005906637758017
Validation loss = 0.0032412756700068712
Validation loss = 0.0027394588105380535
Validation loss = 0.002034418750554323
Validation loss = 0.0017563520232215524
Validation loss = 0.0018911845982074738
Validation loss = 0.0020671493839472532
Validation loss = 0.0014927837764844298
Validation loss = 0.0014498068485409021
Validation loss = 0.001735094701871276
Validation loss = 0.0029499069787561893
Validation loss = 0.0018588711973279715
Validation loss = 0.001891357242129743
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004221566021442413
Validation loss = 0.003627057420089841
Validation loss = 0.002159394323825836
Validation loss = 0.0027862521819770336
Validation loss = 0.002028773305937648
Validation loss = 0.0018415948143228889
Validation loss = 0.0017513815546408296
Validation loss = 0.0040945028886199
Validation loss = 0.0019398352596908808
Validation loss = 0.00248321914114058
Validation loss = 0.0018063883762806654
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002699813339859247
Validation loss = 0.0018198613543063402
Validation loss = 0.0014805771643295884
Validation loss = 0.0022780275903642178
Validation loss = 0.002284593880176544
Validation loss = 0.0017429474974051118
Validation loss = 0.002282478613778949
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002890854375436902
Validation loss = 0.0016602303367108107
Validation loss = 0.002552854595705867
Validation loss = 0.0021651580464094877
Validation loss = 0.0018305400153622031
Validation loss = 0.0024152637924999
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003689927514642477
Validation loss = 0.0020239283330738544
Validation loss = 0.0016340326983481646
Validation loss = 0.001725636189803481
Validation loss = 0.0020212021190673113
Validation loss = 0.001492532785050571
Validation loss = 0.0016843354096636176
Validation loss = 0.0015620006015524268
Validation loss = 0.0017605455359444022
Validation loss = 0.0032426815014332533
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -45.2    |
| Iteration     | 23       |
| MaximumReturn | -0.155   |
| MinimumReturn | -93.3    |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018144299974665046
Validation loss = 0.0014525711303576827
Validation loss = 0.0014501394471153617
Validation loss = 0.0014282517367973924
Validation loss = 0.0018957431893795729
Validation loss = 0.001821983838453889
Validation loss = 0.0015793755883350968
Validation loss = 0.0014397158520296216
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0045390683226287365
Validation loss = 0.001820045174099505
Validation loss = 0.002239885972812772
Validation loss = 0.001526176230981946
Validation loss = 0.0014499938115477562
Validation loss = 0.002095528645440936
Validation loss = 0.0021036267280578613
Validation loss = 0.0016780148725956678
Validation loss = 0.0016764340689405799
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021627626847475767
Validation loss = 0.00211910717189312
Validation loss = 0.001284671714529395
Validation loss = 0.001332008047029376
Validation loss = 0.0013971456792205572
Validation loss = 0.0011726000811904669
Validation loss = 0.0029278243891894817
Validation loss = 0.0018727503484115005
Validation loss = 0.0014715964207425714
Validation loss = 0.0012253110762685537
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002668657572939992
Validation loss = 0.0014811470173299313
Validation loss = 0.0015442334115505219
Validation loss = 0.0013706153258681297
Validation loss = 0.0013641172554343939
Validation loss = 0.001381139736622572
Validation loss = 0.002653679344803095
Validation loss = 0.002364686457440257
Validation loss = 0.0016497534234076738
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003642365336418152
Validation loss = 0.0013928257394582033
Validation loss = 0.0015879893908277154
Validation loss = 0.0014841979136690497
Validation loss = 0.00242730719037354
Validation loss = 0.0013750286307185888
Validation loss = 0.0014357756590470672
Validation loss = 0.0013782402966171503
Validation loss = 0.0017102297861129045
Validation loss = 0.0022025094367563725
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -52.3    |
| Iteration     | 24       |
| MaximumReturn | -0.0975  |
| MinimumReturn | -87.9    |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003239719197154045
Validation loss = 0.0013936754548922181
Validation loss = 0.0029188666958361864
Validation loss = 0.0013460061745718122
Validation loss = 0.00167707703076303
Validation loss = 0.0013157823123037815
Validation loss = 0.0016416446305811405
Validation loss = 0.0014741325285285711
Validation loss = 0.0012248136335983872
Validation loss = 0.0019450607942417264
Validation loss = 0.0012363508576527238
Validation loss = 0.0011601605219766498
Validation loss = 0.0012342469999566674
Validation loss = 0.0013978724600747228
Validation loss = 0.0014181479346007109
Validation loss = 0.001257646013982594
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0024654632434248924
Validation loss = 0.001456419238820672
Validation loss = 0.0015636973548680544
Validation loss = 0.0015478188870474696
Validation loss = 0.003488276619464159
Validation loss = 0.001982594607397914
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0032065557315945625
Validation loss = 0.0029450312722474337
Validation loss = 0.0015328648732975125
Validation loss = 0.00136549212038517
Validation loss = 0.0011164216557517648
Validation loss = 0.0018436803948134184
Validation loss = 0.0026868279092013836
Validation loss = 0.0016725185560062528
Validation loss = 0.0013046860694885254
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005407105199992657
Validation loss = 0.0013724336167797446
Validation loss = 0.0019979854114353657
Validation loss = 0.0014290738618001342
Validation loss = 0.001588529092259705
Validation loss = 0.0015099485171958804
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0036119017750024796
Validation loss = 0.002176347654312849
Validation loss = 0.0016249577747657895
Validation loss = 0.0013129012659192085
Validation loss = 0.0015138451708480716
Validation loss = 0.0012603487120941281
Validation loss = 0.003509041853249073
Validation loss = 0.0012048592325299978
Validation loss = 0.0013392131077125669
Validation loss = 0.001134031917899847
Validation loss = 0.0014004521071910858
Validation loss = 0.002800100715830922
Validation loss = 0.001290339627303183
Validation loss = 0.0017678987933322787
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -55.4    |
| Iteration     | 25       |
| MaximumReturn | -0.971   |
| MinimumReturn | -100     |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004867958836257458
Validation loss = 0.0012252901215106249
Validation loss = 0.0017473100451752543
Validation loss = 0.0013359744334593415
Validation loss = 0.001333990367129445
Validation loss = 0.003559820121154189
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014942371053621173
Validation loss = 0.0018961093155667186
Validation loss = 0.0013605793938040733
Validation loss = 0.0014786281390115619
Validation loss = 0.0017684909980744123
Validation loss = 0.001333532389253378
Validation loss = 0.002054233569651842
Validation loss = 0.0016525078099220991
Validation loss = 0.0013476056046783924
Validation loss = 0.0012521909084171057
Validation loss = 0.001967799151316285
Validation loss = 0.0016026492230594158
Validation loss = 0.0018749277805909514
Validation loss = 0.0010394318960607052
Validation loss = 0.0014751870185136795
Validation loss = 0.001733624143525958
Validation loss = 0.0012544444762170315
Validation loss = 0.001854493748396635
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00223529152572155
Validation loss = 0.0012842328287661076
Validation loss = 0.0011152718216180801
Validation loss = 0.0011518667452037334
Validation loss = 0.001830081338994205
Validation loss = 0.0015959844458848238
Validation loss = 0.0014325212687253952
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017535280203446746
Validation loss = 0.0028774933889508247
Validation loss = 0.0024750689044594765
Validation loss = 0.0013573450269177556
Validation loss = 0.0018944069743156433
Validation loss = 0.0014615230029448867
Validation loss = 0.001503187115304172
Validation loss = 0.0019613560289144516
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018844727892428637
Validation loss = 0.0019561906810849905
Validation loss = 0.0013136181514710188
Validation loss = 0.0012923330068588257
Validation loss = 0.0013092829613015056
Validation loss = 0.0016043796204030514
Validation loss = 0.0011375477770343423
Validation loss = 0.0012324884301051497
Validation loss = 0.001217613578774035
Validation loss = 0.001403488451614976
Validation loss = 0.002064867177978158
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -74.9    |
| Iteration     | 26       |
| MaximumReturn | -1.09    |
| MinimumReturn | -117     |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021491541992872953
Validation loss = 0.001168107963167131
Validation loss = 0.0010344387264922261
Validation loss = 0.0012726769782602787
Validation loss = 0.0013598299119621515
Validation loss = 0.0012162690982222557
Validation loss = 0.0013218291569501162
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013599314261227846
Validation loss = 0.0013886226806789637
Validation loss = 0.0012181251076981425
Validation loss = 0.0011150900973007083
Validation loss = 0.0010662754066288471
Validation loss = 0.0013889165129512548
Validation loss = 0.003088622586801648
Validation loss = 0.0013101883232593536
Validation loss = 0.0015207765391096473
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016301167197525501
Validation loss = 0.0019022634951397777
Validation loss = 0.0014066736912354827
Validation loss = 0.0014891136670485139
Validation loss = 0.0011432587634772062
Validation loss = 0.00100148213095963
Validation loss = 0.0014647491043433547
Validation loss = 0.0026482229586690664
Validation loss = 0.0015181058552116156
Validation loss = 0.0012061053421348333
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003050372935831547
Validation loss = 0.001422058790922165
Validation loss = 0.0012900069123134017
Validation loss = 0.0012982661137357354
Validation loss = 0.0015690758591517806
Validation loss = 0.0012236187467351556
Validation loss = 0.0012800019467249513
Validation loss = 0.0021775751374661922
Validation loss = 0.0012357052182778716
Validation loss = 0.002622938249260187
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017889320151880383
Validation loss = 0.0016469735419377685
Validation loss = 0.001349124708212912
Validation loss = 0.0018667050171643496
Validation loss = 0.0014050689060240984
Validation loss = 0.0017940551042556763
Validation loss = 0.0018111375393345952
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -118     |
| Iteration     | 27       |
| MaximumReturn | -28.2    |
| MinimumReturn | -152     |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004158124327659607
Validation loss = 0.0015884569147601724
Validation loss = 0.0012584415962919593
Validation loss = 0.0015223239315673709
Validation loss = 0.0012822364224120975
Validation loss = 0.0013706941390410066
Validation loss = 0.0016494402661919594
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018837982788681984
Validation loss = 0.0016914330190047622
Validation loss = 0.0016187798464670777
Validation loss = 0.0011761666974052787
Validation loss = 0.0012514051049947739
Validation loss = 0.0018529249355196953
Validation loss = 0.0015339007368311286
Validation loss = 0.00126192148309201
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00553807383403182
Validation loss = 0.0022070149425417185
Validation loss = 0.0014179073041304946
Validation loss = 0.0010768481297418475
Validation loss = 0.0010535247856751084
Validation loss = 0.0012743042316287756
Validation loss = 0.0009971385588869452
Validation loss = 0.0014023897238075733
Validation loss = 0.0015675077447667718
Validation loss = 0.0012089632218703628
Validation loss = 0.0010505031095817685
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004876692779362202
Validation loss = 0.001649730489589274
Validation loss = 0.0019679320976138115
Validation loss = 0.0014732819981873035
Validation loss = 0.002184194279834628
Validation loss = 0.0027172823902219534
Validation loss = 0.0011708294041454792
Validation loss = 0.0011643680045381188
Validation loss = 0.00129127805121243
Validation loss = 0.0013969829306006432
Validation loss = 0.001087491400539875
Validation loss = 0.0013876993907615542
Validation loss = 0.0013619782403111458
Validation loss = 0.002041320316493511
Validation loss = 0.0010238460963591933
Validation loss = 0.0013613775372505188
Validation loss = 0.0011756886960938573
Validation loss = 0.0011379559291526675
Validation loss = 0.0015081505989655852
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0022176275961101055
Validation loss = 0.0014349869452416897
Validation loss = 0.0012602003989741206
Validation loss = 0.0012232853332534432
Validation loss = 0.0011564594460651278
Validation loss = 0.0013337022392079234
Validation loss = 0.001260568737052381
Validation loss = 0.0011244226479902864
Validation loss = 0.0012779413955286145
Validation loss = 0.0017252983525395393
Validation loss = 0.001363016664981842
Validation loss = 0.0014595744432881474
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -95.6    |
| Iteration     | 28       |
| MaximumReturn | -6.26    |
| MinimumReturn | -141     |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017832469893619418
Validation loss = 0.001103498158045113
Validation loss = 0.0012392373755574226
Validation loss = 0.001160438172519207
Validation loss = 0.0015833873767405748
Validation loss = 0.0014629954239353538
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004623125307261944
Validation loss = 0.001251325709745288
Validation loss = 0.0012025258038192987
Validation loss = 0.001194777199998498
Validation loss = 0.001588299754075706
Validation loss = 0.001330047845840454
Validation loss = 0.001056182780303061
Validation loss = 0.0014896609354764223
Validation loss = 0.001021511503495276
Validation loss = 0.001264649792574346
Validation loss = 0.0012418144615367055
Validation loss = 0.0012198378099128604
Validation loss = 0.0014552964130416512
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017799180932343006
Validation loss = 0.0017010222654789686
Validation loss = 0.002248982898890972
Validation loss = 0.0011112024076282978
Validation loss = 0.0013800978194922209
Validation loss = 0.001670190249569714
Validation loss = 0.00206900667399168
Validation loss = 0.0011467277072370052
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002238430315628648
Validation loss = 0.0011871441965922713
Validation loss = 0.00114104722160846
Validation loss = 0.0012288845609873533
Validation loss = 0.0012360014952719212
Validation loss = 0.0018180913757532835
Validation loss = 0.0014474696945399046
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002903762273490429
Validation loss = 0.0011207244824618101
Validation loss = 0.0009917939314618707
Validation loss = 0.0017541222041472793
Validation loss = 0.0011311309644952416
Validation loss = 0.0012239062925800681
Validation loss = 0.0010858334135264158
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -75.5    |
| Iteration     | 29       |
| MaximumReturn | -0.121   |
| MinimumReturn | -150     |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015339222736656666
Validation loss = 0.001566029153764248
Validation loss = 0.001128682168200612
Validation loss = 0.0014121498679742217
Validation loss = 0.0016898538451641798
Validation loss = 0.0012426801258698106
Validation loss = 0.0011478938395157456
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002234772779047489
Validation loss = 0.0010682963766157627
Validation loss = 0.0011823528911918402
Validation loss = 0.0010028157848864794
Validation loss = 0.001366248121485114
Validation loss = 0.0012491077650338411
Validation loss = 0.0016045093070715666
Validation loss = 0.0010905968956649303
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001559857977554202
Validation loss = 0.0014120804844424129
Validation loss = 0.001163356937468052
Validation loss = 0.0010585180716589093
Validation loss = 0.0012334218481555581
Validation loss = 0.0034800944849848747
Validation loss = 0.0011119539849460125
Validation loss = 0.0011386987753212452
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0024990420788526535
Validation loss = 0.0013841640902683139
Validation loss = 0.001376218511722982
Validation loss = 0.00120828568469733
Validation loss = 0.001155366888269782
Validation loss = 0.0011070369509980083
Validation loss = 0.0014668292133137584
Validation loss = 0.0014965326990932226
Validation loss = 0.0013620218960568309
Validation loss = 0.0011252844706177711
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0024556785356253386
Validation loss = 0.0010805854108184576
Validation loss = 0.0010789926163852215
Validation loss = 0.0010205624857917428
Validation loss = 0.0011940973345190287
Validation loss = 0.0012938528088852763
Validation loss = 0.001464245142415166
Validation loss = 0.0011530283372849226
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -134     |
| Iteration     | 30       |
| MaximumReturn | -93.5    |
| MinimumReturn | -156     |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00265033938921988
Validation loss = 0.0011578949633985758
Validation loss = 0.0011805671965703368
Validation loss = 0.0018695289036259055
Validation loss = 0.0010419739410281181
Validation loss = 0.0010928953997790813
Validation loss = 0.0015327294822782278
Validation loss = 0.0012821231503039598
Validation loss = 0.0013029525289312005
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018226123647764325
Validation loss = 0.001679212087765336
Validation loss = 0.0012376244412735105
Validation loss = 0.0009987959638237953
Validation loss = 0.0012054966064170003
Validation loss = 0.0012373034842312336
Validation loss = 0.0013517426559701562
Validation loss = 0.0011280245380476117
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019542996305972338
Validation loss = 0.0016168225556612015
Validation loss = 0.0010501088108867407
Validation loss = 0.001117204432375729
Validation loss = 0.0014454516349360347
Validation loss = 0.0010945219546556473
Validation loss = 0.0009160516201518476
Validation loss = 0.00155743770301342
Validation loss = 0.0015454955864697695
Validation loss = 0.0017590952338650823
Validation loss = 0.0011137642432004213
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002686270046979189
Validation loss = 0.001268557971343398
Validation loss = 0.001419382868334651
Validation loss = 0.002346067689359188
Validation loss = 0.0010645987931638956
Validation loss = 0.0011645358754321933
Validation loss = 0.0012224811362102628
Validation loss = 0.0011588947381824255
Validation loss = 0.001141675398685038
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017784207593649626
Validation loss = 0.0010668730828911066
Validation loss = 0.0012888832716271281
Validation loss = 0.0009715588530525565
Validation loss = 0.0011512907221913338
Validation loss = 0.000929736124817282
Validation loss = 0.0011584287276491523
Validation loss = 0.0012270648730918765
Validation loss = 0.0011550721246749163
Validation loss = 0.0021485318429768085
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -113     |
| Iteration     | 31       |
| MaximumReturn | -1.48    |
| MinimumReturn | -149     |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012512451503425837
Validation loss = 0.0011206928174942732
Validation loss = 0.001071796054020524
Validation loss = 0.0009471307275816798
Validation loss = 0.0009709219448268414
Validation loss = 0.0015418874099850655
Validation loss = 0.0014314218424260616
Validation loss = 0.0015852635260671377
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001529983594082296
Validation loss = 0.0015278697246685624
Validation loss = 0.0011846391716971993
Validation loss = 0.0011118086986243725
Validation loss = 0.0015468201600015163
Validation loss = 0.0018810522742569447
Validation loss = 0.0017408886924386024
Validation loss = 0.0009282436221837997
Validation loss = 0.0011152924271300435
Validation loss = 0.0010611654724925756
Validation loss = 0.0017036646604537964
Validation loss = 0.0013740577269345522
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015950987581163645
Validation loss = 0.0009030356304720044
Validation loss = 0.0015965636121109128
Validation loss = 0.001241380232386291
Validation loss = 0.0011574310483410954
Validation loss = 0.0013693139189854264
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011153023224323988
Validation loss = 0.0010053509613499045
Validation loss = 0.0009758792002685368
Validation loss = 0.002418003510683775
Validation loss = 0.0008845571428537369
Validation loss = 0.0012874797685071826
Validation loss = 0.001047317055054009
Validation loss = 0.001329187536612153
Validation loss = 0.0014709525275975466
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013160192174836993
Validation loss = 0.0011650694068521261
Validation loss = 0.000949727778788656
Validation loss = 0.0013058765325695276
Validation loss = 0.0017086634179577231
Validation loss = 0.0015200679190456867
Validation loss = 0.0011523165740072727
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -120     |
| Iteration     | 32       |
| MaximumReturn | -3.83    |
| MinimumReturn | -152     |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011208652285858989
Validation loss = 0.001116612576879561
Validation loss = 0.0013537724735215306
Validation loss = 0.0010753674432635307
Validation loss = 0.0008879891829565167
Validation loss = 0.0015451029175892472
Validation loss = 0.0015224780654534698
Validation loss = 0.0009240826475434005
Validation loss = 0.0010805086931213737
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014320207992568612
Validation loss = 0.0009823821019381285
Validation loss = 0.0008389391005039215
Validation loss = 0.0010388074442744255
Validation loss = 0.0013506077229976654
Validation loss = 0.0022604300174862146
Validation loss = 0.0018225816311314702
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016626629512757063
Validation loss = 0.0010886328527703881
Validation loss = 0.001201448729261756
Validation loss = 0.002236223081126809
Validation loss = 0.0011236012214794755
Validation loss = 0.0011745552765205503
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002373775467276573
Validation loss = 0.0011049637105315924
Validation loss = 0.00111938058398664
Validation loss = 0.0011018028017133474
Validation loss = 0.0011465514544397593
Validation loss = 0.0010958313941955566
Validation loss = 0.0011517767561599612
Validation loss = 0.0010963634122163057
Validation loss = 0.001145755173638463
Validation loss = 0.0011048170272260904
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010992062743753195
Validation loss = 0.0013544572284445167
Validation loss = 0.0010082194348797202
Validation loss = 0.0011743392096832395
Validation loss = 0.0010161221725866199
Validation loss = 0.0009828881593421102
Validation loss = 0.0008477190276607871
Validation loss = 0.0009458080749027431
Validation loss = 0.0013989594299346209
Validation loss = 0.0011361561482772231
Validation loss = 0.001222303370013833
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -79.9    |
| Iteration     | 33       |
| MaximumReturn | -0.322   |
| MinimumReturn | -128     |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012073967372998595
Validation loss = 0.001070533413439989
Validation loss = 0.0013470061821863055
Validation loss = 0.001026631216518581
Validation loss = 0.0011584582971408963
Validation loss = 0.0011795542668551207
Validation loss = 0.0009895104449242353
Validation loss = 0.0011349209817126393
Validation loss = 0.0014008618891239166
Validation loss = 0.0016904519870877266
Validation loss = 0.0011159023270010948
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000980260199867189
Validation loss = 0.0011832783930003643
Validation loss = 0.0012484666658565402
Validation loss = 0.0012865167809650302
Validation loss = 0.0011320821940898895
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010392166441306472
Validation loss = 0.001582133467309177
Validation loss = 0.001386590301990509
Validation loss = 0.0008444127161055803
Validation loss = 0.0010467274114489555
Validation loss = 0.0009618574986234307
Validation loss = 0.0009150987607426941
Validation loss = 0.0009559646132402122
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017989141633734107
Validation loss = 0.000983847538009286
Validation loss = 0.0008777420734986663
Validation loss = 0.0010163988918066025
Validation loss = 0.0010739740682765841
Validation loss = 0.0010090629803016782
Validation loss = 0.0011031979229301214
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012433871161192656
Validation loss = 0.0010260873241350055
Validation loss = 0.0011969484621658921
Validation loss = 0.0013539883075281978
Validation loss = 0.0010924224043264985
Validation loss = 0.0012412585783749819
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -60.9    |
| Iteration     | 34       |
| MaximumReturn | -4.8     |
| MinimumReturn | -90.8    |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010176673531532288
Validation loss = 0.001307139522396028
Validation loss = 0.0010339895961806178
Validation loss = 0.0014831428416073322
Validation loss = 0.0015610585687682033
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010989177972078323
Validation loss = 0.0010414764983579516
Validation loss = 0.0015375675866380334
Validation loss = 0.0008692473056726158
Validation loss = 0.0012941035674884915
Validation loss = 0.000990934087894857
Validation loss = 0.001070933067239821
Validation loss = 0.0009279868681915104
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001580582931637764
Validation loss = 0.0012635422172024846
Validation loss = 0.0008332978468388319
Validation loss = 0.0012411616044119
Validation loss = 0.0009433859377168119
Validation loss = 0.0009127629455178976
Validation loss = 0.0012435876997187734
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017389114946126938
Validation loss = 0.0011111259227618575
Validation loss = 0.001240008044987917
Validation loss = 0.0010971719166263938
Validation loss = 0.0013914118753746152
Validation loss = 0.0012851657811552286
Validation loss = 0.0012292582541704178
Validation loss = 0.0008970604976639152
Validation loss = 0.0010058406041935086
Validation loss = 0.0011961436830461025
Validation loss = 0.0010584017727524042
Validation loss = 0.0012773991329595447
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008757439209148288
Validation loss = 0.0011235305573791265
Validation loss = 0.001469305600039661
Validation loss = 0.001184131484478712
Validation loss = 0.0012694868491962552
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.28    |
| Iteration     | 35       |
| MaximumReturn | -0.11    |
| MinimumReturn | -20.6    |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016876483568921685
Validation loss = 0.0011933924397453666
Validation loss = 0.0016253297217190266
Validation loss = 0.0008884829003363848
Validation loss = 0.0008628902724012733
Validation loss = 0.001051716972142458
Validation loss = 0.0008585824980400503
Validation loss = 0.0012635983293876052
Validation loss = 0.001076798653230071
Validation loss = 0.0015793461352586746
Validation loss = 0.0009721374954096973
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001432691584341228
Validation loss = 0.000967855507042259
Validation loss = 0.0011183333117514849
Validation loss = 0.0011272180126979947
Validation loss = 0.000923511921428144
Validation loss = 0.0010038017062470317
Validation loss = 0.0008509968174621463
Validation loss = 0.0010826198849827051
Validation loss = 0.0015965924831107259
Validation loss = 0.0014130870113149285
Validation loss = 0.0012077534338459373
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010598314693197608
Validation loss = 0.0009221610380336642
Validation loss = 0.001237081247381866
Validation loss = 0.0010195873910561204
Validation loss = 0.0009765361901372671
Validation loss = 0.001051529892720282
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011498942039906979
Validation loss = 0.0010823082411661744
Validation loss = 0.001106301904655993
Validation loss = 0.0010647396557033062
Validation loss = 0.0012255795300006866
Validation loss = 0.0010848143137991428
Validation loss = 0.0012205608654767275
Validation loss = 0.000976693001575768
Validation loss = 0.00138187559787184
Validation loss = 0.001548144151456654
Validation loss = 0.0016440685139968991
Validation loss = 0.001004309975542128
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009700153605081141
Validation loss = 0.001130337710492313
Validation loss = 0.0009607982938177884
Validation loss = 0.00124637212138623
Validation loss = 0.000977850635536015
Validation loss = 0.0009918025461956859
Validation loss = 0.004209707025438547
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.354   |
| Iteration     | 36       |
| MaximumReturn | -0.0971  |
| MinimumReturn | -3.06    |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000920481514185667
Validation loss = 0.0011885266285389662
Validation loss = 0.0011389938881620765
Validation loss = 0.0009536512079648674
Validation loss = 0.0009143055649474263
Validation loss = 0.0008977426332421601
Validation loss = 0.0009709332371130586
Validation loss = 0.0008269422105513513
Validation loss = 0.0019303567241877317
Validation loss = 0.0008935200166888535
Validation loss = 0.0010686949826776981
Validation loss = 0.001478733029216528
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011347961844876409
Validation loss = 0.0008505438454449177
Validation loss = 0.0013144611148163676
Validation loss = 0.0009350092150270939
Validation loss = 0.0012692427262663841
Validation loss = 0.0011621026787906885
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016495619202032685
Validation loss = 0.0010731322690844536
Validation loss = 0.0013189399614930153
Validation loss = 0.001118451007641852
Validation loss = 0.0009312425390817225
Validation loss = 0.004382381681352854
Validation loss = 0.0022521500941365957
Validation loss = 0.0009996698936447501
Validation loss = 0.001212932518683374
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000870420248247683
Validation loss = 0.0008872520993463695
Validation loss = 0.0010380513267591596
Validation loss = 0.002039482584223151
Validation loss = 0.0011074364883825183
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016866809455677867
Validation loss = 0.0010265727760270238
Validation loss = 0.0010142708197236061
Validation loss = 0.0014921913389116526
Validation loss = 0.0018610412953421474
Validation loss = 0.0011107918107882142
Validation loss = 0.0008290273253805935
Validation loss = 0.0016918941400945187
Validation loss = 0.0009253344614990056
Validation loss = 0.001021350733935833
Validation loss = 0.001104699564166367
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.6    |
| Iteration     | 37       |
| MaximumReturn | -0.173   |
| MinimumReturn | -48.9    |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000803134054876864
Validation loss = 0.00097826833371073
Validation loss = 0.0010713104857131839
Validation loss = 0.0009497760329395533
Validation loss = 0.0008151507936418056
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008942341664806008
Validation loss = 0.0010279263369739056
Validation loss = 0.0009435344254598022
Validation loss = 0.0010752810630947351
Validation loss = 0.0016851008404046297
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001059944275766611
Validation loss = 0.0014686043141409755
Validation loss = 0.0014814106980338693
Validation loss = 0.001156580401584506
Validation loss = 0.0007368304068222642
Validation loss = 0.0010858643800020218
Validation loss = 0.001250952365808189
Validation loss = 0.0013326962944120169
Validation loss = 0.0011763647198677063
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009964294731616974
Validation loss = 0.0023080818355083466
Validation loss = 0.0010768494103103876
Validation loss = 0.001257346710190177
Validation loss = 0.0009540743194520473
Validation loss = 0.0008780205389484763
Validation loss = 0.0013411157997325063
Validation loss = 0.0013799556763842702
Validation loss = 0.001007328275591135
Validation loss = 0.0013022735947743058
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001087269396521151
Validation loss = 0.00096034916350618
Validation loss = 0.0008846809505484998
Validation loss = 0.0009228423587046564
Validation loss = 0.0014890751335769892
Validation loss = 0.0009460656438022852
Validation loss = 0.0009766018483787775
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.93    |
| Iteration     | 38       |
| MaximumReturn | -0.191   |
| MinimumReturn | -56      |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008396143093705177
Validation loss = 0.0010005071526393294
Validation loss = 0.0008601183071732521
Validation loss = 0.000999606680124998
Validation loss = 0.000899743172340095
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010817497968673706
Validation loss = 0.0010386318899691105
Validation loss = 0.00106269889511168
Validation loss = 0.0012713909382000566
Validation loss = 0.0017789398552849889
Validation loss = 0.001037420122884214
Validation loss = 0.0009944508783519268
Validation loss = 0.0013144661206752062
Validation loss = 0.0008230343228206038
Validation loss = 0.0010718044359236956
Validation loss = 0.001049761543981731
Validation loss = 0.001176110585220158
Validation loss = 0.0010046508396044374
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010289332130923867
Validation loss = 0.0020542193669825792
Validation loss = 0.0008441057289019227
Validation loss = 0.0014506606385111809
Validation loss = 0.0010286099277436733
Validation loss = 0.0011490751057863235
Validation loss = 0.001121381064876914
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011313757859170437
Validation loss = 0.0011176415719091892
Validation loss = 0.001023336430080235
Validation loss = 0.0008961104322224855
Validation loss = 0.0012028535129502416
Validation loss = 0.0011757458560168743
Validation loss = 0.001124848611652851
Validation loss = 0.0009890833171084523
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012813432840630412
Validation loss = 0.0012032972881570458
Validation loss = 0.0014505403814837337
Validation loss = 0.0012768891174346209
Validation loss = 0.0010293704690411687
Validation loss = 0.0009321164106950164
Validation loss = 0.0009008870110847056
Validation loss = 0.0011881059035658836
Validation loss = 0.0016280872514471412
Validation loss = 0.0010761498706415296
Validation loss = 0.000940732890740037
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -50.8    |
| Iteration     | 39       |
| MaximumReturn | -0.0364  |
| MinimumReturn | -100     |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000905633089132607
Validation loss = 0.0009136698790825903
Validation loss = 0.002070454880595207
Validation loss = 0.0010599002707749605
Validation loss = 0.0011113202199339867
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001118514221161604
Validation loss = 0.0010535483015701175
Validation loss = 0.0009510575328022242
Validation loss = 0.0007402982446365058
Validation loss = 0.0009776767110452056
Validation loss = 0.0010670776246115565
Validation loss = 0.001035634079016745
Validation loss = 0.000874666147865355
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010360473534092307
Validation loss = 0.0008751236600801349
Validation loss = 0.0008936615195125341
Validation loss = 0.0006914783152751625
Validation loss = 0.0010247832396999002
Validation loss = 0.0008592371013946831
Validation loss = 0.0009662998490966856
Validation loss = 0.000816898129414767
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002506198361515999
Validation loss = 0.0008404357940889895
Validation loss = 0.0007985967095009983
Validation loss = 0.0008038093219511211
Validation loss = 0.0015206093667075038
Validation loss = 0.00137858756352216
Validation loss = 0.0008865552954375744
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009140359470620751
Validation loss = 0.0009299186640419066
Validation loss = 0.0008725957595743239
Validation loss = 0.0010952414013445377
Validation loss = 0.0011820888612419367
Validation loss = 0.0012452692026272416
Validation loss = 0.0007985885022208095
Validation loss = 0.0010277214460074902
Validation loss = 0.0011856018099933863
Validation loss = 0.0009408024488948286
Validation loss = 0.0008127347100526094
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36.1    |
| Iteration     | 40       |
| MaximumReturn | -0.29    |
| MinimumReturn | -63.5    |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009582997299730778
Validation loss = 0.0012146960943937302
Validation loss = 0.0008776370086707175
Validation loss = 0.0008351893629878759
Validation loss = 0.0010750859510153532
Validation loss = 0.0008520500268787146
Validation loss = 0.0010163614060729742
Validation loss = 0.0008015710627660155
Validation loss = 0.0018947720527648926
Validation loss = 0.0007710378849878907
Validation loss = 0.0010361893801018596
Validation loss = 0.0011983347358182073
Validation loss = 0.0009400923154316843
Validation loss = 0.0008800638024695218
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010306283365935087
Validation loss = 0.000889720453415066
Validation loss = 0.0008912432240322232
Validation loss = 0.0012213935842737556
Validation loss = 0.001207858556881547
Validation loss = 0.0008907181327231228
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008560694404877722
Validation loss = 0.0008525973535142839
Validation loss = 0.0010209311731159687
Validation loss = 0.0008489419706165791
Validation loss = 0.001076407148502767
Validation loss = 0.0014610690996050835
Validation loss = 0.0014284936478361487
Validation loss = 0.0010173163609579206
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012643873924389482
Validation loss = 0.000792392180301249
Validation loss = 0.0008438023505732417
Validation loss = 0.0011432904284447432
Validation loss = 0.0011459350353106856
Validation loss = 0.0013523990055546165
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010517046321183443
Validation loss = 0.0010912558063864708
Validation loss = 0.0011393867898732424
Validation loss = 0.0010436788434162736
Validation loss = 0.0008888369775377214
Validation loss = 0.0011034340132027864
Validation loss = 0.0016229903558269143
Validation loss = 0.0010263912845402956
Validation loss = 0.0015173200517892838
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -10.4    |
| Iteration     | 41       |
| MaximumReturn | -0.334   |
| MinimumReturn | -37.3    |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010262446012347937
Validation loss = 0.0011506584705784917
Validation loss = 0.0008834193577058613
Validation loss = 0.0017251279205083847
Validation loss = 0.0008463353733532131
Validation loss = 0.0011640270240604877
Validation loss = 0.001093273051083088
Validation loss = 0.0007581626996397972
Validation loss = 0.0008174089598469436
Validation loss = 0.0013532694429159164
Validation loss = 0.0012440599966794252
Validation loss = 0.0009208947303704917
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008646859205327928
Validation loss = 0.0008248398080468178
Validation loss = 0.0013292081421241164
Validation loss = 0.0008527099271304905
Validation loss = 0.0007686269818805158
Validation loss = 0.0009752684272825718
Validation loss = 0.0009821051498875022
Validation loss = 0.0010633912170305848
Validation loss = 0.0010008696699514985
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008123465813696384
Validation loss = 0.0008672176045365632
Validation loss = 0.0010106547269970179
Validation loss = 0.0009889707434922457
Validation loss = 0.0009972547413781285
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001261451281607151
Validation loss = 0.001189486589282751
Validation loss = 0.0011749942786991596
Validation loss = 0.0008882939000613987
Validation loss = 0.0008462329860776663
Validation loss = 0.0010072308359667659
Validation loss = 0.0008203762117773294
Validation loss = 0.0007957301568239927
Validation loss = 0.0011618237476795912
Validation loss = 0.0009387657046318054
Validation loss = 0.002638751408085227
Validation loss = 0.0012045137118548155
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008493555942550302
Validation loss = 0.0010583179537206888
Validation loss = 0.0010490510612726212
Validation loss = 0.0016532622976228595
Validation loss = 0.0009614524315111339
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -17.7    |
| Iteration     | 42       |
| MaximumReturn | -0.849   |
| MinimumReturn | -40.6    |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009166004601866007
Validation loss = 0.0008076943922787905
Validation loss = 0.0009653887245804071
Validation loss = 0.0008395530749112368
Validation loss = 0.000931842252612114
Validation loss = 0.0015368950553238392
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011895191855728626
Validation loss = 0.0008273225394077599
Validation loss = 0.0010255371453240514
Validation loss = 0.0007385804201476276
Validation loss = 0.0010878766188398004
Validation loss = 0.0010949154384434223
Validation loss = 0.0010520578362047672
Validation loss = 0.0009833197109401226
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013417062582448125
Validation loss = 0.0008375633624382317
Validation loss = 0.0009121257462538779
Validation loss = 0.0007183710695244372
Validation loss = 0.001208704779855907
Validation loss = 0.0007005661609582603
Validation loss = 0.0009497174178250134
Validation loss = 0.0008909571915864944
Validation loss = 0.0011989593040198088
Validation loss = 0.0008494408102706075
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009383656433783472
Validation loss = 0.0009744528215378523
Validation loss = 0.0009435982210561633
Validation loss = 0.0009684616234153509
Validation loss = 0.0007915952010080218
Validation loss = 0.0011174407554790378
Validation loss = 0.0012819342082366347
Validation loss = 0.0007875636802054942
Validation loss = 0.000874267250765115
Validation loss = 0.0007288458873517811
Validation loss = 0.0008670545648783445
Validation loss = 0.0011593257077038288
Validation loss = 0.0009583844221197069
Validation loss = 0.0010446161031723022
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008134359959512949
Validation loss = 0.0007939485949464142
Validation loss = 0.00088708900148049
Validation loss = 0.0007722368463873863
Validation loss = 0.0008108762558549643
Validation loss = 0.0023249515797942877
Validation loss = 0.0011794022284448147
Validation loss = 0.0012695033801719546
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -91.3    |
| Iteration     | 43       |
| MaximumReturn | -49.7    |
| MinimumReturn | -117     |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012694275937974453
Validation loss = 0.0009527135989628732
Validation loss = 0.0014541923301294446
Validation loss = 0.0008414633339270949
Validation loss = 0.0008165478939190507
Validation loss = 0.0008378047496080399
Validation loss = 0.0013244946021586657
Validation loss = 0.0008096034871414304
Validation loss = 0.0013120131334289908
Validation loss = 0.0009755609789863229
Validation loss = 0.001175273791886866
Validation loss = 0.0009787498274818063
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008660329622216523
Validation loss = 0.001066977740265429
Validation loss = 0.001573050394654274
Validation loss = 0.000859099964145571
Validation loss = 0.0010829688981175423
Validation loss = 0.0009644830133765936
Validation loss = 0.0013099941425025463
Validation loss = 0.0009730333695188165
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009590631234459579
Validation loss = 0.0007465735543519258
Validation loss = 0.0007892215508036315
Validation loss = 0.0009704602998681366
Validation loss = 0.0010062840301543474
Validation loss = 0.0008934398065321147
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008721022168174386
Validation loss = 0.0008081404957920313
Validation loss = 0.0011083822464570403
Validation loss = 0.0011388604762032628
Validation loss = 0.0011019536759704351
Validation loss = 0.0009057180723175406
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009956305148079991
Validation loss = 0.0009304510313086212
Validation loss = 0.0011180179426446557
Validation loss = 0.001059626811183989
Validation loss = 0.0010329369688406587
Validation loss = 0.0009347926243208349
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -51.5    |
| Iteration     | 44       |
| MaximumReturn | -0.108   |
| MinimumReturn | -94.2    |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014276010915637016
Validation loss = 0.0010019851615652442
Validation loss = 0.0007953319582156837
Validation loss = 0.0007387481746263802
Validation loss = 0.0008004492265172303
Validation loss = 0.0010132783791050315
Validation loss = 0.0010835831053555012
Validation loss = 0.000922586303204298
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010601802496239543
Validation loss = 0.000871635100338608
Validation loss = 0.0008479419630020857
Validation loss = 0.0009674549219198525
Validation loss = 0.0009747561998665333
Validation loss = 0.0007589586894027889
Validation loss = 0.001856907387264073
Validation loss = 0.0010101135121658444
Validation loss = 0.0007738727726973593
Validation loss = 0.0009862667648121715
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008879557717591524
Validation loss = 0.0013880084734410048
Validation loss = 0.0008604188333265483
Validation loss = 0.0011966961901634932
Validation loss = 0.0020490430761128664
Validation loss = 0.0008734352886676788
Validation loss = 0.0011809425195679069
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001207514782436192
Validation loss = 0.001015195855870843
Validation loss = 0.001044886652380228
Validation loss = 0.00129970523994416
Validation loss = 0.0013192876940593123
Validation loss = 0.001011579530313611
Validation loss = 0.0010240987176075578
Validation loss = 0.0007108841673471034
Validation loss = 0.0009469683282077312
Validation loss = 0.0007626949809491634
Validation loss = 0.0011415396584197879
Validation loss = 0.0008767637191340327
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008644877234473825
Validation loss = 0.0011057080700993538
Validation loss = 0.0012290922459214926
Validation loss = 0.0016771601513028145
Validation loss = 0.0008982134750112891
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -141     |
| Iteration     | 45       |
| MaximumReturn | -126     |
| MinimumReturn | -151     |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011676033027470112
Validation loss = 0.0009346483275294304
Validation loss = 0.000993104069493711
Validation loss = 0.0012327897129580379
Validation loss = 0.001466516638174653
Validation loss = 0.0008828412392176688
Validation loss = 0.0010095125762745738
Validation loss = 0.0007879610056988895
Validation loss = 0.0008866837597452104
Validation loss = 0.001026078825816512
Validation loss = 0.0015150421531870961
Validation loss = 0.0007041156059131026
Validation loss = 0.0011067723389714956
Validation loss = 0.0008139035198837519
Validation loss = 0.0009876699186861515
Validation loss = 0.0010366617934778333
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0019345283508300781
Validation loss = 0.0009235104662366211
Validation loss = 0.0008018791559152305
Validation loss = 0.0008845693082548678
Validation loss = 0.0007336655398830771
Validation loss = 0.0008527776226401329
Validation loss = 0.0006961430772207677
Validation loss = 0.0008114731754176319
Validation loss = 0.0010829638922587037
Validation loss = 0.0008749637054279447
Validation loss = 0.0008478078525513411
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009564941865392029
Validation loss = 0.0009013974922709167
Validation loss = 0.0011418609647080302
Validation loss = 0.0006650009308941662
Validation loss = 0.0014617226552218199
Validation loss = 0.0012595298467203975
Validation loss = 0.0008714303257875144
Validation loss = 0.0025741527788341045
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009445357718504965
Validation loss = 0.000977155170403421
Validation loss = 0.000718082650564611
Validation loss = 0.0008693902054801583
Validation loss = 0.000847474904730916
Validation loss = 0.0008299849578179419
Validation loss = 0.0012624915689229965
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010178657248616219
Validation loss = 0.0009891665540635586
Validation loss = 0.0010122244711965322
Validation loss = 0.0007114182808436453
Validation loss = 0.0010092899901792407
Validation loss = 0.001014343579299748
Validation loss = 0.0009102813783101737
Validation loss = 0.0017091786721721292
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -130     |
| Iteration     | 46       |
| MaximumReturn | -79.2    |
| MinimumReturn | -150     |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009523698827251792
Validation loss = 0.0008868951117619872
Validation loss = 0.0007425073417834938
Validation loss = 0.0007943326490931213
Validation loss = 0.0008466027793474495
Validation loss = 0.0008324604714289308
Validation loss = 0.001056141685694456
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000961409998126328
Validation loss = 0.001026467070914805
Validation loss = 0.0009992789709940553
Validation loss = 0.0008063644054345787
Validation loss = 0.0007383990450762212
Validation loss = 0.0008098024991340935
Validation loss = 0.001123168389312923
Validation loss = 0.0007982755196280777
Validation loss = 0.0006856029503978789
Validation loss = 0.0009110331302508712
Validation loss = 0.001503055333159864
Validation loss = 0.0007978453068062663
Validation loss = 0.0007079506176523864
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00141874176915735
Validation loss = 0.0008840180817060173
Validation loss = 0.0010394200216978788
Validation loss = 0.0009853580268099904
Validation loss = 0.0011665893252938986
Validation loss = 0.0007780885207466781
Validation loss = 0.0006989199900999665
Validation loss = 0.0008660154417157173
Validation loss = 0.0007848821696825325
Validation loss = 0.0012559680035337806
Validation loss = 0.0012175205629318953
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011309083783999085
Validation loss = 0.0007802665932103992
Validation loss = 0.0009525801287963986
Validation loss = 0.0011661528842523694
Validation loss = 0.000682927027810365
Validation loss = 0.0009348873281851411
Validation loss = 0.0008181073935702443
Validation loss = 0.0007890870911069214
Validation loss = 0.001058307709172368
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013628408778458834
Validation loss = 0.0009903408354148269
Validation loss = 0.000849151867441833
Validation loss = 0.0010190167231485248
Validation loss = 0.0012046567862853408
Validation loss = 0.0007667213794775307
Validation loss = 0.0008339077467098832
Validation loss = 0.001009140396490693
Validation loss = 0.0009003883460536599
Validation loss = 0.0009364022989757359
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -128     |
| Iteration     | 47       |
| MaximumReturn | -103     |
| MinimumReturn | -159     |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009440750000067055
Validation loss = 0.0007469911361113191
Validation loss = 0.0008603011374361813
Validation loss = 0.0012077197898179293
Validation loss = 0.0007417597807943821
Validation loss = 0.0007276886608451605
Validation loss = 0.0007815365679562092
Validation loss = 0.0007302290759980679
Validation loss = 0.001106505049392581
Validation loss = 0.0010438569588586688
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007687227916903794
Validation loss = 0.0009917790303006768
Validation loss = 0.001088379300199449
Validation loss = 0.0010580421658232808
Validation loss = 0.0007512539741583169
Validation loss = 0.0009881519945338368
Validation loss = 0.0007072505541145802
Validation loss = 0.0009831578936427832
Validation loss = 0.0010756779229268432
Validation loss = 0.0008820712682791054
Validation loss = 0.0007623616838827729
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007099588983692229
Validation loss = 0.0008275123545899987
Validation loss = 0.0006637888145633042
Validation loss = 0.0008973620715551078
Validation loss = 0.0009881159057840705
Validation loss = 0.0006712462636642158
Validation loss = 0.0007171957404352725
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009000718710012734
Validation loss = 0.0009612110443413258
Validation loss = 0.0012079214211553335
Validation loss = 0.0007600948447361588
Validation loss = 0.0009457093547098339
Validation loss = 0.0010648773750290275
Validation loss = 0.0008539233240298927
Validation loss = 0.0008080335101112723
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010320417350158095
Validation loss = 0.0008134946110658348
Validation loss = 0.0012681286316365004
Validation loss = 0.0008476466173306108
Validation loss = 0.001431764685548842
Validation loss = 0.0007715055253356695
Validation loss = 0.0010015054140239954
Validation loss = 0.0008128376794047654
Validation loss = 0.0008907881565392017
Validation loss = 0.0013424432836472988
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -78.6    |
| Iteration     | 48       |
| MaximumReturn | -0.0818  |
| MinimumReturn | -135     |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009877311531454325
Validation loss = 0.00064184790244326
Validation loss = 0.0007144041010178626
Validation loss = 0.0006677557248622179
Validation loss = 0.0007573725306428969
Validation loss = 0.0010592668550089002
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007337491260841489
Validation loss = 0.0009218868799507618
Validation loss = 0.0008655015844851732
Validation loss = 0.0006920049781911075
Validation loss = 0.0008325130911543965
Validation loss = 0.0007093327003531158
Validation loss = 0.0010871179401874542
Validation loss = 0.0008510306943207979
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000850070093292743
Validation loss = 0.000949679990299046
Validation loss = 0.0008683197665959597
Validation loss = 0.0009452016674913466
Validation loss = 0.001251476351171732
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007364639895968139
Validation loss = 0.001001065014861524
Validation loss = 0.0009704293916001916
Validation loss = 0.0009991145925596356
Validation loss = 0.0006377348327077925
Validation loss = 0.0008885893039405346
Validation loss = 0.0008570751524530351
Validation loss = 0.0010019789915531874
Validation loss = 0.0016816802090033889
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008150658686645329
Validation loss = 0.0008059192332439125
Validation loss = 0.0007501948275603354
Validation loss = 0.0008076150552369654
Validation loss = 0.0007788597140461206
Validation loss = 0.001150712720118463
Validation loss = 0.0008889405871741474
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -117     |
| Iteration     | 49       |
| MaximumReturn | -84.3    |
| MinimumReturn | -134     |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00102148181758821
Validation loss = 0.0007452433346770704
Validation loss = 0.0010713690426200628
Validation loss = 0.0006119194440543652
Validation loss = 0.0010516116162762046
Validation loss = 0.0008237621514126658
Validation loss = 0.000802856229711324
Validation loss = 0.0010375736746937037
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008537056855857372
Validation loss = 0.0010552991880103946
Validation loss = 0.0011275397846475244
Validation loss = 0.0007164929411374032
Validation loss = 0.0008500545518472791
Validation loss = 0.0007700745481997728
Validation loss = 0.001036350498907268
Validation loss = 0.0008802025695331395
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011329271364957094
Validation loss = 0.0007177061634138227
Validation loss = 0.0011519590625539422
Validation loss = 0.0006737353396601975
Validation loss = 0.0006795526714995503
Validation loss = 0.0008496730588376522
Validation loss = 0.0007357069407589734
Validation loss = 0.0007902049692347646
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000857007980812341
Validation loss = 0.0007910839049145579
Validation loss = 0.0009234611643478274
Validation loss = 0.0009231375879608095
Validation loss = 0.00078528345329687
Validation loss = 0.0011978051625192165
Validation loss = 0.0009573885472491384
Validation loss = 0.0008252030820585787
Validation loss = 0.0007349718944169581
Validation loss = 0.0007563093095086515
Validation loss = 0.0010294795501977205
Validation loss = 0.000987705891020596
Validation loss = 0.000728490820620209
Validation loss = 0.0007578872609883547
Validation loss = 0.000784634321462363
Validation loss = 0.0007976157939992845
Validation loss = 0.0009452698868699372
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008435266790911555
Validation loss = 0.0008566995966248214
Validation loss = 0.0007500528590753675
Validation loss = 0.0008044314454309642
Validation loss = 0.001073823543265462
Validation loss = 0.0010444401996210217
Validation loss = 0.0007624057470820844
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -67.2    |
| Iteration     | 50       |
| MaximumReturn | -0.106   |
| MinimumReturn | -92.4    |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007704350864514709
Validation loss = 0.0007726197363808751
Validation loss = 0.0007684968295507133
Validation loss = 0.0007516874466091394
Validation loss = 0.0008007536525838077
Validation loss = 0.0009020935976877809
Validation loss = 0.0007876651361584663
Validation loss = 0.000772928586229682
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007280283607542515
Validation loss = 0.0010788771323859692
Validation loss = 0.0008133225492201746
Validation loss = 0.0008648195653222501
Validation loss = 0.0007506959955208004
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010522797238081694
Validation loss = 0.0010950530413538218
Validation loss = 0.001136645209044218
Validation loss = 0.0009630294516682625
Validation loss = 0.0007219607359729707
Validation loss = 0.0008473935886286199
Validation loss = 0.00097514851950109
Validation loss = 0.0008351049036718905
Validation loss = 0.001300449250265956
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007257729303091764
Validation loss = 0.0007978118374012411
Validation loss = 0.0007389189559035003
Validation loss = 0.0007512180600315332
Validation loss = 0.0007623209967277944
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010593924671411514
Validation loss = 0.0008072461350820959
Validation loss = 0.0009193982696160674
Validation loss = 0.0007753687677904963
Validation loss = 0.001056244713254273
Validation loss = 0.0021389417815953493
Validation loss = 0.0007902284269221127
Validation loss = 0.0007555676274932921
Validation loss = 0.0009183117072097957
Validation loss = 0.0009671085863374174
Validation loss = 0.0009183179936371744
Validation loss = 0.0011169173521921039
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -41.3    |
| Iteration     | 51       |
| MaximumReturn | -0.00253 |
| MinimumReturn | -79.1    |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006635208847001195
Validation loss = 0.0006911773816682398
Validation loss = 0.0007262142607942224
Validation loss = 0.0008449386805295944
Validation loss = 0.0008293578866869211
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007652357453480363
Validation loss = 0.0007569089066237211
Validation loss = 0.0012989210663363338
Validation loss = 0.0007511215517297387
Validation loss = 0.0009061008458957076
Validation loss = 0.0006640919018536806
Validation loss = 0.0007600252865813673
Validation loss = 0.0015109794912859797
Validation loss = 0.0011122898431494832
Validation loss = 0.0007160985842347145
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007672181236557662
Validation loss = 0.0007578111835755408
Validation loss = 0.0011291649425402284
Validation loss = 0.0008787617553025484
Validation loss = 0.0006988912355154753
Validation loss = 0.0011417290661484003
Validation loss = 0.0008104356238618493
Validation loss = 0.0008676426368765533
Validation loss = 0.001115944585762918
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008146180189214647
Validation loss = 0.0011282155755907297
Validation loss = 0.0007436403539031744
Validation loss = 0.0008919413667172194
Validation loss = 0.0007756963605061173
Validation loss = 0.0006277303909882903
Validation loss = 0.0006931261741556227
Validation loss = 0.0007148373406380415
Validation loss = 0.0006643125670962036
Validation loss = 0.0007011925335973501
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006440875586122274
Validation loss = 0.0008171554654836655
Validation loss = 0.0007870534318499267
Validation loss = 0.0007376216817647219
Validation loss = 0.0008521266281604767
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -4.49     |
| Iteration     | 52        |
| MaximumReturn | -0.000736 |
| MinimumReturn | -64.5     |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006907443748787045
Validation loss = 0.0008016704814508557
Validation loss = 0.0009245820110663772
Validation loss = 0.0009880205616354942
Validation loss = 0.0008596627158112824
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007498182239942253
Validation loss = 0.0009394304361194372
Validation loss = 0.0006953793345019221
Validation loss = 0.0007172954501584172
Validation loss = 0.0008425958221778274
Validation loss = 0.0009603520156815648
Validation loss = 0.0007136140484362841
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007207075832411647
Validation loss = 0.0007944940007291734
Validation loss = 0.0008232936961576343
Validation loss = 0.000913804629817605
Validation loss = 0.0007714313687756658
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007690265192650259
Validation loss = 0.0008005517302080989
Validation loss = 0.0008052060729824007
Validation loss = 0.0009330662433058023
Validation loss = 0.001235509873367846
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013091904111206532
Validation loss = 0.0008262788178399205
Validation loss = 0.0007111441809684038
Validation loss = 0.0007031993591226637
Validation loss = 0.0008410260197706521
Validation loss = 0.000846050912514329
Validation loss = 0.0009112922707572579
Validation loss = 0.0007932771695777774
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3.3      |
| Iteration     | 53        |
| MaximumReturn | -0.000727 |
| MinimumReturn | -82       |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009326176368631423
Validation loss = 0.0007899599149823189
Validation loss = 0.0008213489199988544
Validation loss = 0.0012401124695315957
Validation loss = 0.0011684921337291598
Validation loss = 0.0010592566104605794
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010032500140368938
Validation loss = 0.0006355845252983272
Validation loss = 0.0008469480671919882
Validation loss = 0.0011949469335377216
Validation loss = 0.0006800606497563422
Validation loss = 0.000949594599660486
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007649874314665794
Validation loss = 0.000822083733510226
Validation loss = 0.0007146092830225825
Validation loss = 0.0007667969330213964
Validation loss = 0.0006303346599452198
Validation loss = 0.0007734809769317508
Validation loss = 0.0007623571436852217
Validation loss = 0.001065399730578065
Validation loss = 0.000905748805962503
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009432169026695192
Validation loss = 0.0008834125474095345
Validation loss = 0.0008201596792787313
Validation loss = 0.0010345487389713526
Validation loss = 0.0007119226502254605
Validation loss = 0.0007009547553025186
Validation loss = 0.0007024625083431602
Validation loss = 0.0009336312650702894
Validation loss = 0.0008403306128457189
Validation loss = 0.0008162410813383758
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007209400064311922
Validation loss = 0.0009286858257837594
Validation loss = 0.0010919752530753613
Validation loss = 0.0009350155014544725
Validation loss = 0.0010093426099047065
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36.1    |
| Iteration     | 54       |
| MaximumReturn | -0.259   |
| MinimumReturn | -71.2    |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000763511226978153
Validation loss = 0.0008028071606531739
Validation loss = 0.0006053880206309259
Validation loss = 0.000832603604067117
Validation loss = 0.0007303279126062989
Validation loss = 0.000742838135920465
Validation loss = 0.001989901764318347
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007782300817780197
Validation loss = 0.0009122515330091119
Validation loss = 0.0008074809447862208
Validation loss = 0.0010800352320075035
Validation loss = 0.0008680919418111444
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008521779673174024
Validation loss = 0.0007423452916555107
Validation loss = 0.0006476608687080443
Validation loss = 0.0007345355697907507
Validation loss = 0.0009665459510870278
Validation loss = 0.0008042997214943171
Validation loss = 0.0007895181770436466
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009462694288231432
Validation loss = 0.0007241920684464276
Validation loss = 0.000706543680280447
Validation loss = 0.0007474840967915952
Validation loss = 0.0006595764425583184
Validation loss = 0.0008538178517483175
Validation loss = 0.0013991801533848047
Validation loss = 0.0008047219598665833
Validation loss = 0.0009323050617240369
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018802027916535735
Validation loss = 0.0008084489963948727
Validation loss = 0.0006986876833252609
Validation loss = 0.0006413433002308011
Validation loss = 0.0007390080136246979
Validation loss = 0.000831673271022737
Validation loss = 0.0017106069717556238
Validation loss = 0.0010080750798806548
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.5    |
| Iteration     | 55       |
| MaximumReturn | -0.174   |
| MinimumReturn | -43.8    |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009714838815853
Validation loss = 0.0007738794665783644
Validation loss = 0.0007618904346600175
Validation loss = 0.000960326346103102
Validation loss = 0.0007182383560575545
Validation loss = 0.0007839940953999758
Validation loss = 0.0007726955809630454
Validation loss = 0.000790222838986665
Validation loss = 0.0007040470372885466
Validation loss = 0.0005917293019592762
Validation loss = 0.0009439983987249434
Validation loss = 0.002565091010183096
Validation loss = 0.0006523064221255481
Validation loss = 0.0010732188820838928
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008655850542709231
Validation loss = 0.000712051521986723
Validation loss = 0.0007662232383154333
Validation loss = 0.0013994969194754958
Validation loss = 0.0007839215686544776
Validation loss = 0.0012309972662478685
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007638750830665231
Validation loss = 0.0006299770902842283
Validation loss = 0.0007168222800828516
Validation loss = 0.0006645233370363712
Validation loss = 0.0008122173021547496
Validation loss = 0.0013108145212754607
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000841792905703187
Validation loss = 0.0008084054570645094
Validation loss = 0.0008998193079605699
Validation loss = 0.0008884583367034793
Validation loss = 0.0006857379921711981
Validation loss = 0.0009009514469653368
Validation loss = 0.0007742209709249437
Validation loss = 0.0009681150550022721
Validation loss = 0.0009099930175580084
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00084498937940225
Validation loss = 0.0008111440693028271
Validation loss = 0.0006779624382033944
Validation loss = 0.0007781804888509214
Validation loss = 0.0009504155023023486
Validation loss = 0.0006667588604614139
Validation loss = 0.000702403427567333
Validation loss = 0.0007386748329736292
Validation loss = 0.0007456626044586301
Validation loss = 0.000719781790394336
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.235   |
| Iteration     | 56       |
| MaximumReturn | -0.147   |
| MinimumReturn | -0.343   |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015338450903072953
Validation loss = 0.0007598531083203852
Validation loss = 0.0006089082453399897
Validation loss = 0.0006747627630829811
Validation loss = 0.0008462465484626591
Validation loss = 0.0008680852479301393
Validation loss = 0.0007804672350175679
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007049398845992982
Validation loss = 0.0007945022662170231
Validation loss = 0.000644583604298532
Validation loss = 0.0009325318969786167
Validation loss = 0.0007519656210206449
Validation loss = 0.0008393312455154955
Validation loss = 0.0008358980412594974
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000850168988108635
Validation loss = 0.0009567642118781805
Validation loss = 0.000645346415694803
Validation loss = 0.0009242209489457309
Validation loss = 0.0009032064699567854
Validation loss = 0.0007206390146166086
Validation loss = 0.0009348895400762558
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007353909313678741
Validation loss = 0.0007348915678448975
Validation loss = 0.0006666381959803402
Validation loss = 0.0006674398318864405
Validation loss = 0.000627978122793138
Validation loss = 0.0007523836684413254
Validation loss = 0.0010098089696839452
Validation loss = 0.0005912161432206631
Validation loss = 0.0007183009292930365
Validation loss = 0.0006422043079510331
Validation loss = 0.0010196600342169404
Validation loss = 0.0006237602210603654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008876190986484289
Validation loss = 0.0008930361364036798
Validation loss = 0.0008123531588353217
Validation loss = 0.0007450581179000437
Validation loss = 0.0008097078534774482
Validation loss = 0.0008696892764419317
Validation loss = 0.0007609057356603444
Validation loss = 0.0008198306313715875
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.8    |
| Iteration     | 57       |
| MaximumReturn | -0.198   |
| MinimumReturn | -193     |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001877509173937142
Validation loss = 0.0006943362532183528
Validation loss = 0.0006221607909537852
Validation loss = 0.000854161218740046
Validation loss = 0.0010790418600663543
Validation loss = 0.0006209088605828583
Validation loss = 0.0010537643684074283
Validation loss = 0.000705677317455411
Validation loss = 0.0009290581219829619
Validation loss = 0.00105008773971349
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002833176404237747
Validation loss = 0.0009351844200864434
Validation loss = 0.0007164874114096165
Validation loss = 0.0005923430435359478
Validation loss = 0.0006786034209653735
Validation loss = 0.0009827816393226385
Validation loss = 0.000728902465198189
Validation loss = 0.0011902684345841408
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027973076794296503
Validation loss = 0.0008043731213547289
Validation loss = 0.0006923064356669784
Validation loss = 0.0008736902964301407
Validation loss = 0.0008504293509759009
Validation loss = 0.0008339282940141857
Validation loss = 0.0006259385263547301
Validation loss = 0.0005814656615257263
Validation loss = 0.0006036072154529393
Validation loss = 0.0005928705213591456
Validation loss = 0.0006587646203115582
Validation loss = 0.0007462171488441527
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016292548971250653
Validation loss = 0.0007605797727592289
Validation loss = 0.0010911553399637341
Validation loss = 0.0006059645675122738
Validation loss = 0.0007128231809474528
Validation loss = 0.0008382435189560056
Validation loss = 0.000788651464972645
Validation loss = 0.0008998530684038997
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0037728629540652037
Validation loss = 0.0007109701982699335
Validation loss = 0.000713219924364239
Validation loss = 0.00095884915208444
Validation loss = 0.0009697097702883184
Validation loss = 0.0006854946259409189
Validation loss = 0.0006699570221826434
Validation loss = 0.000740974850486964
Validation loss = 0.0006960812606848776
Validation loss = 0.0006269997102208436
Validation loss = 0.000681183475535363
Validation loss = 0.0005655264249071479
Validation loss = 0.0008415887132287025
Validation loss = 0.0005737795145250857
Validation loss = 0.0008158926502801478
Validation loss = 0.0006081312894821167
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.9    |
| Iteration     | 58       |
| MaximumReturn | -0.227   |
| MinimumReturn | -172     |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002025292254984379
Validation loss = 0.0007076995680108666
Validation loss = 0.0009468277567066252
Validation loss = 0.0006161000928841531
Validation loss = 0.000694028742145747
Validation loss = 0.0006149291875772178
Validation loss = 0.0011744125513359904
Validation loss = 0.0007124373805709183
Validation loss = 0.0018866207683458924
Validation loss = 0.0006790452171117067
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00101111875846982
Validation loss = 0.0009365403093397617
Validation loss = 0.0007166513241827488
Validation loss = 0.0007209146278910339
Validation loss = 0.000552899669855833
Validation loss = 0.0007444309303537011
Validation loss = 0.0007564513944089413
Validation loss = 0.0007051581051200628
Validation loss = 0.0008372655720449984
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013515593018382788
Validation loss = 0.001020194380544126
Validation loss = 0.000680778524838388
Validation loss = 0.0007021244382485747
Validation loss = 0.000981113058514893
Validation loss = 0.0006234122556634247
Validation loss = 0.0006533573614433408
Validation loss = 0.0007911238353699446
Validation loss = 0.0007805089117027819
Validation loss = 0.0009629149571992457
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010237614624202251
Validation loss = 0.0008269340032711625
Validation loss = 0.0005738661275245249
Validation loss = 0.0009016948752105236
Validation loss = 0.000917635508812964
Validation loss = 0.0006209918647073209
Validation loss = 0.0005549107445403934
Validation loss = 0.0005675243446603417
Validation loss = 0.0006335488869808614
Validation loss = 0.0006012811209075153
Validation loss = 0.0011137676192447543
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0035786584485322237
Validation loss = 0.0006325245485641062
Validation loss = 0.0007801590254530311
Validation loss = 0.0006458070711232722
Validation loss = 0.0005938697722740471
Validation loss = 0.0006890983204357326
Validation loss = 0.001699332264252007
Validation loss = 0.0007191361510194838
Validation loss = 0.0007385028875432909
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.8    |
| Iteration     | 59       |
| MaximumReturn | -0.905   |
| MinimumReturn | -108     |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0029541864059865475
Validation loss = 0.0005254079005680978
Validation loss = 0.0006139465258456767
Validation loss = 0.0005262958584353328
Validation loss = 0.0006425928440876305
Validation loss = 0.0006718843942508101
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006154391448944807
Validation loss = 0.0005492497584782541
Validation loss = 0.0004858602478634566
Validation loss = 0.00045457694795913994
Validation loss = 0.0005763441440649331
Validation loss = 0.0005874885828234255
Validation loss = 0.0006188407423906028
Validation loss = 0.0005278384196572006
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0035104816779494286
Validation loss = 0.0005421836394816637
Validation loss = 0.0005252576083876193
Validation loss = 0.0005570683861151338
Validation loss = 0.0004626530280802399
Validation loss = 0.0006254268228076398
Validation loss = 0.0005527054890990257
Validation loss = 0.000554264581296593
Validation loss = 0.0005947435856796801
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002984249498695135
Validation loss = 0.0005334522575139999
Validation loss = 0.0005016885115765035
Validation loss = 0.0005769465351477265
Validation loss = 0.0005067086312919855
Validation loss = 0.0005978401168249547
Validation loss = 0.0007245352026075125
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013264178996905684
Validation loss = 0.0006829635822214186
Validation loss = 0.0005734570440836251
Validation loss = 0.00048065942246466875
Validation loss = 0.0005169005016796291
Validation loss = 0.000517352600581944
Validation loss = 0.00048282090574502945
Validation loss = 0.0006764273275621235
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -172     |
| Iteration     | 60       |
| MaximumReturn | -93.6    |
| MinimumReturn | -216     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002307595917955041
Validation loss = 0.001263036159798503
Validation loss = 0.0006256838096305728
Validation loss = 0.0008519974071532488
Validation loss = 0.0008661517640575767
Validation loss = 0.0011630478547886014
Validation loss = 0.0006368171307258308
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0024190812837332487
Validation loss = 0.0006572130369022489
Validation loss = 0.0006458303541876376
Validation loss = 0.000861068838275969
Validation loss = 0.0009033708483912051
Validation loss = 0.0006994602736085653
Validation loss = 0.002214650856330991
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0026986689772456884
Validation loss = 0.0008858075016178191
Validation loss = 0.0006126423832029104
Validation loss = 0.0015132566913962364
Validation loss = 0.0007746091578155756
Validation loss = 0.0008322052890434861
Validation loss = 0.0006333955680020154
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0041135866194963455
Validation loss = 0.0010368932271376252
Validation loss = 0.001122624147683382
Validation loss = 0.0007779294974170625
Validation loss = 0.0005825666594319046
Validation loss = 0.0006118456949479878
Validation loss = 0.0012159934267401695
Validation loss = 0.0012256961781531572
Validation loss = 0.0008838780922815204
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025232734624296427
Validation loss = 0.0006652745651081204
Validation loss = 0.0008149484056048095
Validation loss = 0.0007334561669267714
Validation loss = 0.001387819997034967
Validation loss = 0.0005998086417093873
Validation loss = 0.0017583586741238832
Validation loss = 0.0007087747799232602
Validation loss = 0.0012306211283430457
Validation loss = 0.0019402418984100223
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -78.3    |
| Iteration     | 61       |
| MaximumReturn | -3.67    |
| MinimumReturn | -200     |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009491058299317956
Validation loss = 0.0009709220030345023
Validation loss = 0.002018130850046873
Validation loss = 0.0013212612830102444
Validation loss = 0.001994022633880377
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001178445527330041
Validation loss = 0.0009059516014531255
Validation loss = 0.0021700675133615732
Validation loss = 0.0010118660284206271
Validation loss = 0.0008465431164950132
Validation loss = 0.0017812700243666768
Validation loss = 0.0010344303445890546
Validation loss = 0.0015211438294500113
Validation loss = 0.0008638494182378054
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009864058811217546
Validation loss = 0.0007478960324078798
Validation loss = 0.0020272363908588886
Validation loss = 0.0012947296490892768
Validation loss = 0.0009169178665615618
Validation loss = 0.001514818286523223
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017238953150808811
Validation loss = 0.0016416746657341719
Validation loss = 0.0009036408737301826
Validation loss = 0.0012491322122514248
Validation loss = 0.0010728328488767147
Validation loss = 0.0020703638438135386
Validation loss = 0.001257025869563222
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008441426325589418
Validation loss = 0.0008567131590098143
Validation loss = 0.001146980095654726
Validation loss = 0.0011714922729879618
Validation loss = 0.0012496490962803364
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -205     |
| Iteration     | 62       |
| MaximumReturn | -79.1    |
| MinimumReturn | -233     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021740831434726715
Validation loss = 0.0013025817461311817
Validation loss = 0.0012280395021662116
Validation loss = 0.0011871624737977982
Validation loss = 0.003634365741163492
Validation loss = 0.0014663622714579105
Validation loss = 0.001203329535201192
Validation loss = 0.0011706596706062555
Validation loss = 0.0010664216242730618
Validation loss = 0.0017236608546227217
Validation loss = 0.001013015629723668
Validation loss = 0.0011350685963407159
Validation loss = 0.001254538306966424
Validation loss = 0.0011765359668061137
Validation loss = 0.0012243473902344704
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003108369652181864
Validation loss = 0.0019344749161973596
Validation loss = 0.0014379729982465506
Validation loss = 0.0009509336086921394
Validation loss = 0.0009395632660016418
Validation loss = 0.001403561793267727
Validation loss = 0.001268670428544283
Validation loss = 0.0012256543850526214
Validation loss = 0.0012132148258388042
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024422649294137955
Validation loss = 0.001809040899388492
Validation loss = 0.001044749515131116
Validation loss = 0.0018248043488711119
Validation loss = 0.0016444013454020023
Validation loss = 0.0017694301204755902
Validation loss = 0.0009314900962635875
Validation loss = 0.0013741153525188565
Validation loss = 0.0009252470917999744
Validation loss = 0.0008207127102650702
Validation loss = 0.0009460804867558181
Validation loss = 0.0012471969239413738
Validation loss = 0.0013005774235352874
Validation loss = 0.001154222059994936
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001485126675106585
Validation loss = 0.001138345804065466
Validation loss = 0.0020318110473454
Validation loss = 0.0009169342811219394
Validation loss = 0.0015561175532639027
Validation loss = 0.0011626124614849687
Validation loss = 0.0008333448204211891
Validation loss = 0.0019175319466739893
Validation loss = 0.001017407514154911
Validation loss = 0.0013363753678277135
Validation loss = 0.0010563198011368513
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019832372199743986
Validation loss = 0.0016264764126390219
Validation loss = 0.0011201179586350918
Validation loss = 0.002153012901544571
Validation loss = 0.0009385793819092214
Validation loss = 0.0009816536912694573
Validation loss = 0.001067324192263186
Validation loss = 0.0014319663168862462
Validation loss = 0.0013384934281930327
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -184     |
| Iteration     | 63       |
| MaximumReturn | -85.3    |
| MinimumReturn | -220     |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0022193831391632557
Validation loss = 0.0014414259931072593
Validation loss = 0.0014091107295826077
Validation loss = 0.0012233811430633068
Validation loss = 0.0017672133399173617
Validation loss = 0.0016915937885642052
Validation loss = 0.001318299095146358
Validation loss = 0.0011222721077501774
Validation loss = 0.001666453666985035
Validation loss = 0.0013060743222013116
Validation loss = 0.0011986277531832457
Validation loss = 0.0019760304130613804
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001908282982185483
Validation loss = 0.0017217921558767557
Validation loss = 0.001767165376804769
Validation loss = 0.0016065399395301938
Validation loss = 0.001190630136989057
Validation loss = 0.0013052962021902204
Validation loss = 0.001348762190900743
Validation loss = 0.001888183644041419
Validation loss = 0.0013251769123598933
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013448799727484584
Validation loss = 0.001435865298844874
Validation loss = 0.0010883912909775972
Validation loss = 0.0016234852373600006
Validation loss = 0.0013132686726748943
Validation loss = 0.0010690598282963037
Validation loss = 0.0014654570259153843
Validation loss = 0.0012961032334715128
Validation loss = 0.0015708819264546037
Validation loss = 0.0012630214914679527
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016643506241962314
Validation loss = 0.0012856099056079984
Validation loss = 0.0020480407401919365
Validation loss = 0.00133233901578933
Validation loss = 0.0032356237061321735
Validation loss = 0.001464102417230606
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015359526732936502
Validation loss = 0.0012376444647088647
Validation loss = 0.0028382446616888046
Validation loss = 0.0015852008946239948
Validation loss = 0.0011652357643470168
Validation loss = 0.0010010162368416786
Validation loss = 0.000958540360443294
Validation loss = 0.0009631617576815188
Validation loss = 0.0029602469876408577
Validation loss = 0.000881875806953758
Validation loss = 0.0011384557001292706
Validation loss = 0.0015680177602916956
Validation loss = 0.001149686868302524
Validation loss = 0.0020331053528934717
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -71      |
| Iteration     | 64       |
| MaximumReturn | -2.39    |
| MinimumReturn | -205     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012684919638559222
Validation loss = 0.001174682634882629
Validation loss = 0.001552057801745832
Validation loss = 0.0014563112054020166
Validation loss = 0.0014250781387090683
Validation loss = 0.0012778498930856586
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018804912688210607
Validation loss = 0.0020297677256166935
Validation loss = 0.0013513241428881884
Validation loss = 0.0022377243731170893
Validation loss = 0.0016837390139698982
Validation loss = 0.0019414800917729735
Validation loss = 0.0012873888481408358
Validation loss = 0.001306072692386806
Validation loss = 0.001325044664554298
Validation loss = 0.001088135875761509
Validation loss = 0.0013606970896944404
Validation loss = 0.0011525643058121204
Validation loss = 0.0011842807289212942
Validation loss = 0.001260998542420566
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018740430241450667
Validation loss = 0.0018442021682858467
Validation loss = 0.0013172628823667765
Validation loss = 0.001938639092259109
Validation loss = 0.0011272181291133165
Validation loss = 0.001113965641707182
Validation loss = 0.0014922285918146372
Validation loss = 0.0018070057267323136
Validation loss = 0.0011652408866211772
Validation loss = 0.0012508127838373184
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0022621965035796165
Validation loss = 0.0011963862925767899
Validation loss = 0.0011232288088649511
Validation loss = 0.001621042494662106
Validation loss = 0.0011314119910821319
Validation loss = 0.001350928214378655
Validation loss = 0.0014685885980725288
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013438643654808402
Validation loss = 0.0013908346882089972
Validation loss = 0.0025505225639790297
Validation loss = 0.0019615774508565664
Validation loss = 0.0015503348549827933
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -173     |
| Iteration     | 65       |
| MaximumReturn | -34.7    |
| MinimumReturn | -231     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002028954913839698
Validation loss = 0.0014796670293435454
Validation loss = 0.0011581371072679758
Validation loss = 0.0014981774147599936
Validation loss = 0.000984905636869371
Validation loss = 0.0009418526315130293
Validation loss = 0.0010932065779343247
Validation loss = 0.0014792688889428973
Validation loss = 0.0010583411203697324
Validation loss = 0.0011937040835618973
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015942978207021952
Validation loss = 0.0013946451945230365
Validation loss = 0.001201677368953824
Validation loss = 0.0011386299738660455
Validation loss = 0.0013271900825202465
Validation loss = 0.0021403999999165535
Validation loss = 0.0010612958576530218
Validation loss = 0.0013523364905267954
Validation loss = 0.0010389195522293448
Validation loss = 0.0008614351972937584
Validation loss = 0.0009721422102302313
Validation loss = 0.0010356707498431206
Validation loss = 0.001359656103886664
Validation loss = 0.0013009917456656694
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013266849564388394
Validation loss = 0.001190683338791132
Validation loss = 0.0011156195541843772
Validation loss = 0.0011761168716475368
Validation loss = 0.0016273210057988763
Validation loss = 0.0011507496237754822
Validation loss = 0.0010602164547890425
Validation loss = 0.0011288762325420976
Validation loss = 0.0010453048162162304
Validation loss = 0.0013406575890257955
Validation loss = 0.0009755987557582557
Validation loss = 0.0013942498480901122
Validation loss = 0.0010626689763739705
Validation loss = 0.0013250400079414248
Validation loss = 0.0009800728876143694
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018344338750466704
Validation loss = 0.00107020721770823
Validation loss = 0.0013169015292078257
Validation loss = 0.0014344799565151334
Validation loss = 0.0010637996019795537
Validation loss = 0.0012754796771332622
Validation loss = 0.0011836420744657516
Validation loss = 0.0012055456172674894
Validation loss = 0.0010069208219647408
Validation loss = 0.001095929299481213
Validation loss = 0.0009729403536766768
Validation loss = 0.0013752579689025879
Validation loss = 0.0014821930089965463
Validation loss = 0.0011508367024362087
Validation loss = 0.0014992643846198916
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001591098727658391
Validation loss = 0.0011558615369722247
Validation loss = 0.001500314800068736
Validation loss = 0.0031840470619499683
Validation loss = 0.0011895884526893497
Validation loss = 0.001823186525143683
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -28.2    |
| Iteration     | 66       |
| MaximumReturn | -0.111   |
| MinimumReturn | -188     |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011794967576861382
Validation loss = 0.001576535403728485
Validation loss = 0.0012474089162424207
Validation loss = 0.0012129665119573474
Validation loss = 0.0014568434562534094
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016210498288273811
Validation loss = 0.0015386814484372735
Validation loss = 0.0023326610680669546
Validation loss = 0.0014391050208359957
Validation loss = 0.0013965865364298224
Validation loss = 0.0020914822816848755
Validation loss = 0.0013926830142736435
Validation loss = 0.001485452987253666
Validation loss = 0.0011372988810762763
Validation loss = 0.0017458281945437193
Validation loss = 0.0017141008283942938
Validation loss = 0.001512460527010262
Validation loss = 0.001119568943977356
Validation loss = 0.0011023953557014465
Validation loss = 0.0011888999724760652
Validation loss = 0.0011526450980454683
Validation loss = 0.0011091080959886312
Validation loss = 0.0013556207995861769
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013584293192252517
Validation loss = 0.0017052673501893878
Validation loss = 0.0011748486431315541
Validation loss = 0.0014460744569078088
Validation loss = 0.0009792454075068235
Validation loss = 0.0017134466907009482
Validation loss = 0.0011188056087121367
Validation loss = 0.001351359998807311
Validation loss = 0.001121675712056458
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001414090977050364
Validation loss = 0.0012700624065473676
Validation loss = 0.0012095393612980843
Validation loss = 0.001105843810364604
Validation loss = 0.003905119840055704
Validation loss = 0.0014881992246955633
Validation loss = 0.0020559162367135286
Validation loss = 0.0015289679868146777
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00124556221999228
Validation loss = 0.001398399705067277
Validation loss = 0.001771617098711431
Validation loss = 0.001552900648675859
Validation loss = 0.0010248172329738736
Validation loss = 0.001346656703390181
Validation loss = 0.001487529487349093
Validation loss = 0.0012373266508802772
Validation loss = 0.0011583641171455383
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -125     |
| Iteration     | 67       |
| MaximumReturn | -0.68    |
| MinimumReturn | -213     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012988062808290124
Validation loss = 0.0011090292828157544
Validation loss = 0.001445159548893571
Validation loss = 0.0012719262158498168
Validation loss = 0.001173996482975781
Validation loss = 0.001072838087566197
Validation loss = 0.0024291384033858776
Validation loss = 0.001167795853689313
Validation loss = 0.0014257017755880952
Validation loss = 0.0011870539747178555
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018302942626178265
Validation loss = 0.0014820530777797103
Validation loss = 0.0014815254835411906
Validation loss = 0.001302194083109498
Validation loss = 0.0027381153777241707
Validation loss = 0.0010532955639064312
Validation loss = 0.0013434705324470997
Validation loss = 0.0012551869731396437
Validation loss = 0.0009751568431966007
Validation loss = 0.0014260009629651904
Validation loss = 0.001584830111823976
Validation loss = 0.000888300419319421
Validation loss = 0.00101388618350029
Validation loss = 0.001022904645651579
Validation loss = 0.0010583673138171434
Validation loss = 0.0010554589098319411
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014759213663637638
Validation loss = 0.001358877052552998
Validation loss = 0.001158418832346797
Validation loss = 0.001119948923587799
Validation loss = 0.0011967866448685527
Validation loss = 0.0012564873322844505
Validation loss = 0.0011980036506429315
Validation loss = 0.0013266075402498245
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014639090513810515
Validation loss = 0.0012521143071353436
Validation loss = 0.0012519963784143329
Validation loss = 0.0013435771688818932
Validation loss = 0.0011681070318445563
Validation loss = 0.0014581867726519704
Validation loss = 0.0010979716898873448
Validation loss = 0.001415041508153081
Validation loss = 0.0012453090166673064
Validation loss = 0.0019205929711461067
Validation loss = 0.001104635070078075
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001432825461961329
Validation loss = 0.0013432690175250173
Validation loss = 0.0016366912750527263
Validation loss = 0.0013130434090271592
Validation loss = 0.0012305069249123335
Validation loss = 0.0013438083697110415
Validation loss = 0.0013491626596078277
Validation loss = 0.0009225562680512667
Validation loss = 0.0021381834521889687
Validation loss = 0.0009732332546263933
Validation loss = 0.001697286730632186
Validation loss = 0.001394387218169868
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.52    |
| Iteration     | 68       |
| MaximumReturn | -0.309   |
| MinimumReturn | -35.2    |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013764355098828673
Validation loss = 0.0012707639252766967
Validation loss = 0.0013144883560016751
Validation loss = 0.0016007558442652225
Validation loss = 0.0010842864867299795
Validation loss = 0.003800844307988882
Validation loss = 0.0013111878652125597
Validation loss = 0.0011496685910969973
Validation loss = 0.0012525564525276423
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012143149506300688
Validation loss = 0.0013299877755343914
Validation loss = 0.0011295207077637315
Validation loss = 0.0014667559880763292
Validation loss = 0.0012465096078813076
Validation loss = 0.0011862018145620823
Validation loss = 0.0012959882151335478
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000994781730696559
Validation loss = 0.002102572238072753
Validation loss = 0.0016819366719573736
Validation loss = 0.0017870469018816948
Validation loss = 0.0012946822680532932
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002305949106812477
Validation loss = 0.001128093572333455
Validation loss = 0.0013113669119775295
Validation loss = 0.0012015056563541293
Validation loss = 0.0013299736892804503
Validation loss = 0.0011762294452637434
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016941288486123085
Validation loss = 0.0010988785652443767
Validation loss = 0.001180057879537344
Validation loss = 0.0009962603216990829
Validation loss = 0.0011650023516267538
Validation loss = 0.0017691088141873479
Validation loss = 0.0023473992478102446
Validation loss = 0.0011385638499632478
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11      |
| Iteration     | 69       |
| MaximumReturn | -0.144   |
| MinimumReturn | -60.4    |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010702470317482948
Validation loss = 0.0012307314900681376
Validation loss = 0.0010569770820438862
Validation loss = 0.0014181660953909159
Validation loss = 0.0014475594507530332
Validation loss = 0.0011265100911259651
Validation loss = 0.0020120420958846807
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009986994555220008
Validation loss = 0.001633137697353959
Validation loss = 0.0011978090042248368
Validation loss = 0.0012530300300568342
Validation loss = 0.0012173089198768139
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013945106184110045
Validation loss = 0.0010864424984902143
Validation loss = 0.0013462932547554374
Validation loss = 0.001159715000540018
Validation loss = 0.000976482464466244
Validation loss = 0.0009896676056087017
Validation loss = 0.001201904728077352
Validation loss = 0.000980533310212195
Validation loss = 0.001369031029753387
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019009808311238885
Validation loss = 0.0010181214893236756
Validation loss = 0.0012047500349581242
Validation loss = 0.0010470362612977624
Validation loss = 0.0009953302796930075
Validation loss = 0.0009977740701287985
Validation loss = 0.0011891842586919665
Validation loss = 0.0013039913028478622
Validation loss = 0.0018027783371508121
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015331897884607315
Validation loss = 0.001278595649637282
Validation loss = 0.0012725897831842303
Validation loss = 0.0011105526937171817
Validation loss = 0.0016683473950251937
Validation loss = 0.001945653697475791
Validation loss = 0.0011361862998455763
Validation loss = 0.001097743515856564
Validation loss = 0.001199128688313067
Validation loss = 0.0017466164426878095
Validation loss = 0.0011648336658254266
Validation loss = 0.0014152305666357279
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -218     |
| Iteration     | 70       |
| MaximumReturn | -198     |
| MinimumReturn | -231     |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015603989595547318
Validation loss = 0.0016450755065307021
Validation loss = 0.0010971416486427188
Validation loss = 0.0011341673089191318
Validation loss = 0.002050015376880765
Validation loss = 0.0014117192476987839
Validation loss = 0.0013834425481036305
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001372610335238278
Validation loss = 0.001734830206260085
Validation loss = 0.0011333219008520246
Validation loss = 0.0011821432271972299
Validation loss = 0.0011514228535816073
Validation loss = 0.0018134511774405837
Validation loss = 0.00114239112008363
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0032993564382195473
Validation loss = 0.0012540046591311693
Validation loss = 0.002196317072957754
Validation loss = 0.0011918132659047842
Validation loss = 0.0013836461585015059
Validation loss = 0.0012285324046388268
Validation loss = 0.0017429395811632276
Validation loss = 0.00098980194889009
Validation loss = 0.0013468542601913214
Validation loss = 0.00181399320717901
Validation loss = 0.0012984859058633447
Validation loss = 0.0010587668512016535
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012882380979135633
Validation loss = 0.0016481484053656459
Validation loss = 0.0009554714779369533
Validation loss = 0.0012445807224139571
Validation loss = 0.0015999159077182412
Validation loss = 0.0011103429133072495
Validation loss = 0.0009690566803328693
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012901131995022297
Validation loss = 0.00136775360442698
Validation loss = 0.0011250602547079325
Validation loss = 0.0015436417888849974
Validation loss = 0.0013190555619075894
Validation loss = 0.001082622678950429
Validation loss = 0.0014234621776267886
Validation loss = 0.0015739009249955416
Validation loss = 0.001044288044795394
Validation loss = 0.0010863953502848744
Validation loss = 0.001091629033908248
Validation loss = 0.0011211620876565576
Validation loss = 0.0010196770308539271
Validation loss = 0.0010415943106636405
Validation loss = 0.0009868816705420613
Validation loss = 0.001529042492620647
Validation loss = 0.0012197063770145178
Validation loss = 0.0011333361035212874
Validation loss = 0.002373428549617529
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -118     |
| Iteration     | 71       |
| MaximumReturn | -0.378   |
| MinimumReturn | -224     |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013351204106584191
Validation loss = 0.0013979432405903935
Validation loss = 0.0010325516341254115
Validation loss = 0.0014942362904548645
Validation loss = 0.001149760908447206
Validation loss = 0.001957749715074897
Validation loss = 0.0012033858802169561
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009385722223669291
Validation loss = 0.0013009791728109121
Validation loss = 0.001066494733095169
Validation loss = 0.00106222799513489
Validation loss = 0.0011302951024845243
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014539840631186962
Validation loss = 0.0012197232572361827
Validation loss = 0.0010050492128357291
Validation loss = 0.0014757588505744934
Validation loss = 0.0010068301344290376
Validation loss = 0.0009341514087282121
Validation loss = 0.0009550870745442808
Validation loss = 0.0013136971974745393
Validation loss = 0.001187830581329763
Validation loss = 0.0016442860942333937
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010893051512539387
Validation loss = 0.0010584659175947309
Validation loss = 0.0012874804669991136
Validation loss = 0.001142360968515277
Validation loss = 0.000958349322900176
Validation loss = 0.001004015444777906
Validation loss = 0.0014386516995728016
Validation loss = 0.0009698578505776823
Validation loss = 0.001130795804783702
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011464874260127544
Validation loss = 0.001824618666432798
Validation loss = 0.001141217304393649
Validation loss = 0.0011014339979737997
Validation loss = 0.0010900340275838971
Validation loss = 0.0011311728740110993
Validation loss = 0.0011561348801478744
Validation loss = 0.0011120032286271453
Validation loss = 0.0013449679827317595
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -81.7    |
| Iteration     | 72       |
| MaximumReturn | -0.396   |
| MinimumReturn | -192     |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009836897952482104
Validation loss = 0.0013116843765601516
Validation loss = 0.0018711189040914178
Validation loss = 0.0013648561434820294
Validation loss = 0.0012327468721196055
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009063031757250428
Validation loss = 0.001031448831781745
Validation loss = 0.0009952057152986526
Validation loss = 0.001172181568108499
Validation loss = 0.0013567093992605805
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009743268601596355
Validation loss = 0.0013825399801135063
Validation loss = 0.0008803009404800832
Validation loss = 0.0011218382278457284
Validation loss = 0.000948461121879518
Validation loss = 0.001112152705900371
Validation loss = 0.0012644074158743024
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015286298003047705
Validation loss = 0.000994070083834231
Validation loss = 0.0012063615722581744
Validation loss = 0.001188569120131433
Validation loss = 0.001324318116530776
Validation loss = 0.0008790146093815565
Validation loss = 0.0014703904744237661
Validation loss = 0.0010532077867537737
Validation loss = 0.0008867973228916526
Validation loss = 0.001039552385918796
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001296833623200655
Validation loss = 0.0009276721393689513
Validation loss = 0.0009085500496439636
Validation loss = 0.000986697617918253
Validation loss = 0.0011046308791264892
Validation loss = 0.001460119616240263
Validation loss = 0.0009847237961366773
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -175     |
| Iteration     | 73       |
| MaximumReturn | -68.4    |
| MinimumReturn | -224     |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011733763385564089
Validation loss = 0.0009984589414671063
Validation loss = 0.0020842629019171
Validation loss = 0.0019426867365837097
Validation loss = 0.0012656155740842223
Validation loss = 0.002026640111580491
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011181801091879606
Validation loss = 0.0013335036346688867
Validation loss = 0.0011331518180668354
Validation loss = 0.0010480366181582212
Validation loss = 0.0013406362850219011
Validation loss = 0.0015153896529227495
Validation loss = 0.0013053022557869554
Validation loss = 0.0010205236030742526
Validation loss = 0.0011012624017894268
Validation loss = 0.001042588148266077
Validation loss = 0.0009878671262413263
Validation loss = 0.001124112750403583
Validation loss = 0.001151597360149026
Validation loss = 0.0012260186485946178
Validation loss = 0.0011859351070597768
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014094652142375708
Validation loss = 0.0011428098659962416
Validation loss = 0.0009997752495110035
Validation loss = 0.0012813590001314878
Validation loss = 0.0008978217374533415
Validation loss = 0.001102225505746901
Validation loss = 0.0011356764007359743
Validation loss = 0.0008691769326105714
Validation loss = 0.0010842386400327086
Validation loss = 0.0009753463673405349
Validation loss = 0.0008504841825924814
Validation loss = 0.0013195732608437538
Validation loss = 0.0012726038694381714
Validation loss = 0.0011946579907089472
Validation loss = 0.0014171696966513991
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010189247550442815
Validation loss = 0.0010210606269538403
Validation loss = 0.0015106217470020056
Validation loss = 0.0011197511339560151
Validation loss = 0.0012638198677450418
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013350443914532661
Validation loss = 0.0010512244189158082
Validation loss = 0.0012102510081604123
Validation loss = 0.0010778703726828098
Validation loss = 0.0016981555381789804
Validation loss = 0.0013046935200691223
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -206     |
| Iteration     | 74       |
| MaximumReturn | -134     |
| MinimumReturn | -231     |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009262707317247987
Validation loss = 0.0013108784332871437
Validation loss = 0.0015031832735985518
Validation loss = 0.0017292487900704145
Validation loss = 0.0009948217775672674
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010788203217089176
Validation loss = 0.001011208863928914
Validation loss = 0.0009896939154714346
Validation loss = 0.0013626243453472853
Validation loss = 0.0010153348557651043
Validation loss = 0.0011356482282280922
Validation loss = 0.0013810972450301051
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013952560257166624
Validation loss = 0.0011895359493792057
Validation loss = 0.001367658842355013
Validation loss = 0.0012492540990933776
Validation loss = 0.0009045230108313262
Validation loss = 0.0016338826389983296
Validation loss = 0.0009292963077314198
Validation loss = 0.0012004036689177155
Validation loss = 0.0015772271435707808
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011729382677003741
Validation loss = 0.0010468934196978807
Validation loss = 0.001250888337381184
Validation loss = 0.0009800039697438478
Validation loss = 0.001038304646499455
Validation loss = 0.0010172503534704447
Validation loss = 0.001139652682468295
Validation loss = 0.0016480869380757213
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001349387806840241
Validation loss = 0.0010057048639282584
Validation loss = 0.001101023401133716
Validation loss = 0.0010264621814712882
Validation loss = 0.002781714079901576
Validation loss = 0.0012172033311799169
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -214     |
| Iteration     | 75       |
| MaximumReturn | -154     |
| MinimumReturn | -231     |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0029589824844151735
Validation loss = 0.0011591273359954357
Validation loss = 0.000858295417856425
Validation loss = 0.001053313142620027
Validation loss = 0.0013071427820250392
Validation loss = 0.0012372254859656096
Validation loss = 0.0014262732584029436
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00285914889536798
Validation loss = 0.001340240240097046
Validation loss = 0.0014438778162002563
Validation loss = 0.0009828479960560799
Validation loss = 0.002467187587171793
Validation loss = 0.0015548624796792865
Validation loss = 0.0010716922115534544
Validation loss = 0.0009011763031594455
Validation loss = 0.001276921248063445
Validation loss = 0.0014154225355014205
Validation loss = 0.001206915476359427
Validation loss = 0.0011793664889410138
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017449771985411644
Validation loss = 0.0011315939482301474
Validation loss = 0.001300843432545662
Validation loss = 0.0014540036208927631
Validation loss = 0.0010851208353415132
Validation loss = 0.0013359906151890755
Validation loss = 0.0020883665420114994
Validation loss = 0.001023041782900691
Validation loss = 0.0012088506482541561
Validation loss = 0.0009747626609168947
Validation loss = 0.0012231124565005302
Validation loss = 0.001252349466085434
Validation loss = 0.001065542222931981
Validation loss = 0.0014273551059886813
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016086609102785587
Validation loss = 0.0010517328046262264
Validation loss = 0.0010538127971813083
Validation loss = 0.0013611814938485622
Validation loss = 0.001714078476652503
Validation loss = 0.001272103050723672
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013405372155830264
Validation loss = 0.0012278775684535503
Validation loss = 0.0012222961522638798
Validation loss = 0.001011827029287815
Validation loss = 0.0011124189477413893
Validation loss = 0.001394408755004406
Validation loss = 0.001595767098478973
Validation loss = 0.0015712692402303219
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -184     |
| Iteration     | 76       |
| MaximumReturn | -107     |
| MinimumReturn | -233     |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001670151948928833
Validation loss = 0.0012097916333004832
Validation loss = 0.0019703467842191458
Validation loss = 0.0010617884108796716
Validation loss = 0.0011648141080513597
Validation loss = 0.0012374796206131577
Validation loss = 0.001095669693313539
Validation loss = 0.0010617173975333571
Validation loss = 0.0010412211995571852
Validation loss = 0.0011314062867313623
Validation loss = 0.0011876606149598956
Validation loss = 0.0014594109961763024
Validation loss = 0.0013823360204696655
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0021145716309547424
Validation loss = 0.0009685779805295169
Validation loss = 0.001295634894631803
Validation loss = 0.0010340920416638255
Validation loss = 0.0010960433864966035
Validation loss = 0.0010188482701778412
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009622396319173276
Validation loss = 0.0011934060603380203
Validation loss = 0.0008056372171267867
Validation loss = 0.0011312259593978524
Validation loss = 0.0015596312005072832
Validation loss = 0.001215311698615551
Validation loss = 0.001406316994689405
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012534225825220346
Validation loss = 0.0010400081519037485
Validation loss = 0.001046188990585506
Validation loss = 0.00100421323440969
Validation loss = 0.0013214426580816507
Validation loss = 0.001033602049574256
Validation loss = 0.0009969824459403753
Validation loss = 0.0009747209260240197
Validation loss = 0.0009901372250169516
Validation loss = 0.0010291925864294171
Validation loss = 0.0011793840676546097
Validation loss = 0.0010657842503860593
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013405400095507503
Validation loss = 0.0015041206497699022
Validation loss = 0.003284347942098975
Validation loss = 0.0009546049404889345
Validation loss = 0.0009355449001304805
Validation loss = 0.0019476487068459392
Validation loss = 0.0012617635075002909
Validation loss = 0.001108040101826191
Validation loss = 0.00097698625177145
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -200     |
| Iteration     | 77       |
| MaximumReturn | -31.1    |
| MinimumReturn | -231     |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011497511295601726
Validation loss = 0.001265143626369536
Validation loss = 0.0011081717675551772
Validation loss = 0.0012869618367403746
Validation loss = 0.001656683860346675
Validation loss = 0.0012635679449886084
Validation loss = 0.001116681145504117
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015987870283424854
Validation loss = 0.001318673137575388
Validation loss = 0.000946048938203603
Validation loss = 0.0010261870920658112
Validation loss = 0.0013098990311846137
Validation loss = 0.0008651615353301167
Validation loss = 0.001695714658126235
Validation loss = 0.001931485952809453
Validation loss = 0.0010921237990260124
Validation loss = 0.0009134264546446502
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0026103821583092213
Validation loss = 0.0011371653527021408
Validation loss = 0.0009300550445914268
Validation loss = 0.0012504837941378355
Validation loss = 0.0009947409853339195
Validation loss = 0.0008746503153815866
Validation loss = 0.0015212575672194362
Validation loss = 0.0010521949734538794
Validation loss = 0.0010276801185682416
Validation loss = 0.0011383261298760772
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00113104993943125
Validation loss = 0.0011079944670200348
Validation loss = 0.001149633782915771
Validation loss = 0.0013511435827240348
Validation loss = 0.0012478337157517672
Validation loss = 0.001018947339616716
Validation loss = 0.001185731147415936
Validation loss = 0.0009155752486549318
Validation loss = 0.0008097608224488795
Validation loss = 0.001112665981054306
Validation loss = 0.001157034421339631
Validation loss = 0.0011338992044329643
Validation loss = 0.001349794678390026
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009744221461005509
Validation loss = 0.0010406277142465115
Validation loss = 0.001245318097062409
Validation loss = 0.0008855608757585287
Validation loss = 0.001012928900308907
Validation loss = 0.0010814194101840258
Validation loss = 0.0014128824695944786
Validation loss = 0.0010319529101252556
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -216     |
| Iteration     | 78       |
| MaximumReturn | -174     |
| MinimumReturn | -233     |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011728808749467134
Validation loss = 0.0012887263437733054
Validation loss = 0.0015015465905889869
Validation loss = 0.0014501819387078285
Validation loss = 0.001036795903928578
Validation loss = 0.0013055988820269704
Validation loss = 0.0010554944165050983
Validation loss = 0.0013151193270459771
Validation loss = 0.0008246807265095413
Validation loss = 0.001051845494657755
Validation loss = 0.001422784524038434
Validation loss = 0.0010178074007853866
Validation loss = 0.0013661757111549377
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012625594390556216
Validation loss = 0.0011174788232892752
Validation loss = 0.0010995924239978194
Validation loss = 0.0009445460163988173
Validation loss = 0.0010277971159666777
Validation loss = 0.0011205794289708138
Validation loss = 0.0010077310726046562
Validation loss = 0.0009049155050888658
Validation loss = 0.0009112714324146509
Validation loss = 0.0010672752978280187
Validation loss = 0.0010344082256779075
Validation loss = 0.0011858884245157242
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012040212750434875
Validation loss = 0.0008750706329010427
Validation loss = 0.0009107673540711403
Validation loss = 0.0012907783966511488
Validation loss = 0.0008950700866989791
Validation loss = 0.001125499838963151
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018471755320206285
Validation loss = 0.0010124266846105456
Validation loss = 0.0010689081391319633
Validation loss = 0.001245279097929597
Validation loss = 0.0010683612199500203
Validation loss = 0.001308617414906621
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013233453501015902
Validation loss = 0.0016131412703543901
Validation loss = 0.0014879333321005106
Validation loss = 0.0009394764201715589
Validation loss = 0.0011786904651671648
Validation loss = 0.0011528157629072666
Validation loss = 0.0016767922788858414
Validation loss = 0.0018245034152641892
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -230     |
| Iteration     | 79       |
| MaximumReturn | -213     |
| MinimumReturn | -236     |
| TotalSamples  | 134946   |
----------------------------
