Logging to experiments/invertedPendulum/IPA01/Tue-01-Nov-2022-07-59-07-PM-CDT_invertedPendulum_trpo_iteration_20_seed2531
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7132808566093445
Validation loss = 0.4250606894493103
Validation loss = 0.3677906095981598
Validation loss = 0.32899415493011475
Validation loss = 0.3214675486087799
Validation loss = 0.30485260486602783
Validation loss = 0.27706682682037354
Validation loss = 0.26100507378578186
Validation loss = 0.25002139806747437
Validation loss = 0.2339944839477539
Validation loss = 0.21937859058380127
Validation loss = 0.20615516602993011
Validation loss = 0.20535144209861755
Validation loss = 0.19410188496112823
Validation loss = 0.1865311563014984
Validation loss = 0.1922929733991623
Validation loss = 0.17789670825004578
Validation loss = 0.16405212879180908
Validation loss = 0.17070206999778748
Validation loss = 0.15939243137836456
Validation loss = 0.1619568169116974
Validation loss = 0.15320168435573578
Validation loss = 0.18114341795444489
Validation loss = 0.14070363342761993
Validation loss = 0.1439346820116043
Validation loss = 0.1442987620830536
Validation loss = 0.15770964324474335
Validation loss = 0.13688461482524872
Validation loss = 0.14169630408287048
Validation loss = 0.14300118386745453
Validation loss = 0.13373586535453796
Validation loss = 0.1308905929327011
Validation loss = 0.14195553958415985
Validation loss = 0.1264837086200714
Validation loss = 0.13284368813037872
Validation loss = 0.1317802518606186
Validation loss = 0.12691865861415863
Validation loss = 0.1297353059053421
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7522419095039368
Validation loss = 0.3923915922641754
Validation loss = 0.3532465994358063
Validation loss = 0.3190966546535492
Validation loss = 0.3035816252231598
Validation loss = 0.2722291648387909
Validation loss = 0.2644686996936798
Validation loss = 0.24008363485336304
Validation loss = 0.23663096129894257
Validation loss = 0.21025310456752777
Validation loss = 0.19716845452785492
Validation loss = 0.19324107468128204
Validation loss = 0.18563981354236603
Validation loss = 0.17969222366809845
Validation loss = 0.18218755722045898
Validation loss = 0.17256930470466614
Validation loss = 0.18258121609687805
Validation loss = 0.16726909577846527
Validation loss = 0.16637839376926422
Validation loss = 0.15956638753414154
Validation loss = 0.15687061846256256
Validation loss = 0.16106683015823364
Validation loss = 0.17189155519008636
Validation loss = 0.1787244826555252
Validation loss = 0.1580892652273178
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7230880260467529
Validation loss = 0.42583268880844116
Validation loss = 0.3729720413684845
Validation loss = 0.3226780295372009
Validation loss = 0.31088387966156006
Validation loss = 0.2931194305419922
Validation loss = 0.28606152534484863
Validation loss = 0.2547546327114105
Validation loss = 0.24032577872276306
Validation loss = 0.22809284925460815
Validation loss = 0.21458055078983307
Validation loss = 0.21018366515636444
Validation loss = 0.2040383666753769
Validation loss = 0.20020079612731934
Validation loss = 0.19189950823783875
Validation loss = 0.1902923583984375
Validation loss = 0.18792739510536194
Validation loss = 0.18160133063793182
Validation loss = 0.19713212549686432
Validation loss = 0.18630707263946533
Validation loss = 0.16973307728767395
Validation loss = 0.1810898631811142
Validation loss = 0.16484731435775757
Validation loss = 0.15721480548381805
Validation loss = 0.1626116931438446
Validation loss = 0.1582965850830078
Validation loss = 0.14681090414524078
Validation loss = 0.150349959731102
Validation loss = 0.14177197217941284
Validation loss = 0.14069421589374542
Validation loss = 0.1438785046339035
Validation loss = 0.14247174561023712
Validation loss = 0.13821682333946228
Validation loss = 0.13757003843784332
Validation loss = 0.12988168001174927
Validation loss = 0.1492987424135208
Validation loss = 0.13875699043273926
Validation loss = 0.13996855914592743
Validation loss = 0.150853231549263
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7375016808509827
Validation loss = 0.3837495446205139
Validation loss = 0.36183953285217285
Validation loss = 0.32111066579818726
Validation loss = 0.3086870610713959
Validation loss = 0.2744090259075165
Validation loss = 0.2591657340526581
Validation loss = 0.23797348141670227
Validation loss = 0.23247647285461426
Validation loss = 0.20976673066616058
Validation loss = 0.19946666061878204
Validation loss = 0.19983306527137756
Validation loss = 0.18870101869106293
Validation loss = 0.1889459192752838
Validation loss = 0.1743178367614746
Validation loss = 0.1710902750492096
Validation loss = 0.16819322109222412
Validation loss = 0.16647972166538239
Validation loss = 0.1710290163755417
Validation loss = 0.17877419292926788
Validation loss = 0.17111051082611084
Validation loss = 0.16886113584041595
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7223096489906311
Validation loss = 0.3817460238933563
Validation loss = 0.3551452159881592
Validation loss = 0.3373028039932251
Validation loss = 0.31005072593688965
Validation loss = 0.2903116047382355
Validation loss = 0.2746751606464386
Validation loss = 0.24295879900455475
Validation loss = 0.2404935508966446
Validation loss = 0.2273547202348709
Validation loss = 0.21938863396644592
Validation loss = 0.21692222356796265
Validation loss = 0.2165915071964264
Validation loss = 0.19280686974525452
Validation loss = 0.20623186230659485
Validation loss = 0.1804146021604538
Validation loss = 0.1828046441078186
Validation loss = 0.17465630173683167
Validation loss = 0.16667716205120087
Validation loss = 0.15881289541721344
Validation loss = 0.1562020480632782
Validation loss = 0.1513209193944931
Validation loss = 0.16037419438362122
Validation loss = 0.16016362607479095
Validation loss = 0.1551658809185028
Validation loss = 0.1514197587966919
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.544   |
| Iteration     | 0        |
| MaximumReturn | -0.0256  |
| MinimumReturn | -6.33    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3295166492462158
Validation loss = 0.20813219249248505
Validation loss = 0.19263754785060883
Validation loss = 0.17397546768188477
Validation loss = 0.15239985287189484
Validation loss = 0.15197458863258362
Validation loss = 0.13472174108028412
Validation loss = 0.12405584752559662
Validation loss = 0.11464881896972656
Validation loss = 0.11209453642368317
Validation loss = 0.11020685732364655
Validation loss = 0.10879571735858917
Validation loss = 0.10583052039146423
Validation loss = 0.09777097404003143
Validation loss = 0.1011720672249794
Validation loss = 0.10968365520238876
Validation loss = 0.09801286458969116
Validation loss = 0.0955728143453598
Validation loss = 0.09079660475254059
Validation loss = 0.08941972255706787
Validation loss = 0.10009101778268814
Validation loss = 0.08829250186681747
Validation loss = 0.09213651716709137
Validation loss = 0.08498881757259369
Validation loss = 0.08880069106817245
Validation loss = 0.09278848022222519
Validation loss = 0.08149449527263641
Validation loss = 0.08735079318284988
Validation loss = 0.07917851209640503
Validation loss = 0.07840249687433243
Validation loss = 0.07750056684017181
Validation loss = 0.08004146814346313
Validation loss = 0.07516762614250183
Validation loss = 0.07862354815006256
Validation loss = 0.09704820066690445
Validation loss = 0.09818711131811142
Validation loss = 0.08888887614011765
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.29933980107307434
Validation loss = 0.1861030012369156
Validation loss = 0.16458673775196075
Validation loss = 0.1498226523399353
Validation loss = 0.13959157466888428
Validation loss = 0.1353064328432083
Validation loss = 0.12742550671100616
Validation loss = 0.1281513124704361
Validation loss = 0.1252988576889038
Validation loss = 0.12077668309211731
Validation loss = 0.10845637321472168
Validation loss = 0.11652982980012894
Validation loss = 0.11016548424959183
Validation loss = 0.10699905455112457
Validation loss = 0.12174870073795319
Validation loss = 0.10986752808094025
Validation loss = 0.1022924855351448
Validation loss = 0.10661159455776215
Validation loss = 0.10461397469043732
Validation loss = 0.10261949151754379
Validation loss = 0.09541524201631546
Validation loss = 0.09422227740287781
Validation loss = 0.10523385554552078
Validation loss = 0.09794939309358597
Validation loss = 0.09057891368865967
Validation loss = 0.09370457381010056
Validation loss = 0.09942726790904999
Validation loss = 0.10085493326187134
Validation loss = 0.09044510871171951
Validation loss = 0.09591418504714966
Validation loss = 0.0890687108039856
Validation loss = 0.09751682728528976
Validation loss = 0.09367021173238754
Validation loss = 0.08646645396947861
Validation loss = 0.08397447317838669
Validation loss = 0.0810767114162445
Validation loss = 0.07954075187444687
Validation loss = 0.08007330447435379
Validation loss = 0.07908567041158676
Validation loss = 0.09072986245155334
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3398786187171936
Validation loss = 0.20447593927383423
Validation loss = 0.18521913886070251
Validation loss = 0.1697825789451599
Validation loss = 0.1542019248008728
Validation loss = 0.1555710732936859
Validation loss = 0.13537564873695374
Validation loss = 0.12830236554145813
Validation loss = 0.12229593843221664
Validation loss = 0.12414304912090302
Validation loss = 0.11435999721288681
Validation loss = 0.11845587193965912
Validation loss = 0.11365368962287903
Validation loss = 0.11279931664466858
Validation loss = 0.10385819524526596
Validation loss = 0.11012662947177887
Validation loss = 0.10048270970582962
Validation loss = 0.09747118502855301
Validation loss = 0.09342540055513382
Validation loss = 0.10552982240915298
Validation loss = 0.09924106299877167
Validation loss = 0.10210488736629486
Validation loss = 0.09875944256782532
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2697613537311554
Validation loss = 0.17176364362239838
Validation loss = 0.1552242934703827
Validation loss = 0.14139702916145325
Validation loss = 0.13156799972057343
Validation loss = 0.13912099599838257
Validation loss = 0.1283821016550064
Validation loss = 0.12105083465576172
Validation loss = 0.11152373254299164
Validation loss = 0.11662536859512329
Validation loss = 0.12100908905267715
Validation loss = 0.10544107854366302
Validation loss = 0.10275334864854813
Validation loss = 0.12900234758853912
Validation loss = 0.11024077236652374
Validation loss = 0.10105491429567337
Validation loss = 0.11908721923828125
Validation loss = 0.10212357342243195
Validation loss = 0.09948434680700302
Validation loss = 0.09593968838453293
Validation loss = 0.09537020325660706
Validation loss = 0.1005437895655632
Validation loss = 0.09977493435144424
Validation loss = 0.10152655839920044
Validation loss = 0.09762827306985855
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.28995779156684875
Validation loss = 0.18521633744239807
Validation loss = 0.1637420952320099
Validation loss = 0.14228709042072296
Validation loss = 0.13677826523780823
Validation loss = 0.1360154151916504
Validation loss = 0.12895651161670685
Validation loss = 0.12859684228897095
Validation loss = 0.1190037950873375
Validation loss = 0.11546232551336288
Validation loss = 0.10300164669752121
Validation loss = 0.10640892386436462
Validation loss = 0.10578925907611847
Validation loss = 0.10761716216802597
Validation loss = 0.09416771680116653
Validation loss = 0.12001693248748779
Validation loss = 0.09457661211490631
Validation loss = 0.12858060002326965
Validation loss = 0.09548871219158173
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0273  |
| Iteration     | 1        |
| MaximumReturn | -0.0187  |
| MinimumReturn | -0.0426  |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.088710255920887
Validation loss = 0.05674833059310913
Validation loss = 0.04663458839058876
Validation loss = 0.04917900264263153
Validation loss = 0.046337779611349106
Validation loss = 0.048792339861392975
Validation loss = 0.060848914086818695
Validation loss = 0.06452062726020813
Validation loss = 0.04826670140028
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1189008578658104
Validation loss = 0.07034555822610855
Validation loss = 0.06560225039720535
Validation loss = 0.05695826932787895
Validation loss = 0.05567152425646782
Validation loss = 0.04987127333879471
Validation loss = 0.05374360829591751
Validation loss = 0.05012907087802887
Validation loss = 0.047199685126543045
Validation loss = 0.04560035467147827
Validation loss = 0.042420826852321625
Validation loss = 0.05289778485894203
Validation loss = 0.04360741004347801
Validation loss = 0.046732187271118164
Validation loss = 0.04006824642419815
Validation loss = 0.047956012189388275
Validation loss = 0.04262750968337059
Validation loss = 0.04824994504451752
Validation loss = 0.04761713370680809
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08614696562290192
Validation loss = 0.06738755106925964
Validation loss = 0.06183083727955818
Validation loss = 0.05631842091679573
Validation loss = 0.05323610454797745
Validation loss = 0.0536879263818264
Validation loss = 0.05415964871644974
Validation loss = 0.052094895392656326
Validation loss = 0.05367185175418854
Validation loss = 0.05632016807794571
Validation loss = 0.05006645619869232
Validation loss = 0.049689825624227524
Validation loss = 0.04697662964463234
Validation loss = 0.05422721058130264
Validation loss = 0.052259255200624466
Validation loss = 0.04959416761994362
Validation loss = 0.0516890250146389
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09414898604154587
Validation loss = 0.0764719769358635
Validation loss = 0.07782596349716187
Validation loss = 0.07870083302259445
Validation loss = 0.0609331913292408
Validation loss = 0.06340990215539932
Validation loss = 0.08788195252418518
Validation loss = 0.062083709985017776
Validation loss = 0.05816366896033287
Validation loss = 0.053875651210546494
Validation loss = 0.055111777037382126
Validation loss = 0.05323837324976921
Validation loss = 0.07191775739192963
Validation loss = 0.05393676832318306
Validation loss = 0.06092698872089386
Validation loss = 0.05658615380525589
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07754400372505188
Validation loss = 0.06334145367145538
Validation loss = 0.06315503269433975
Validation loss = 0.05545274168252945
Validation loss = 0.060452431440353394
Validation loss = 0.05827983841300011
Validation loss = 0.05340280383825302
Validation loss = 0.05448606237769127
Validation loss = 0.05441788583993912
Validation loss = 0.05844136327505112
Validation loss = 0.05019991472363472
Validation loss = 0.056507132947444916
Validation loss = 0.053482428193092346
Validation loss = 0.048001620918512344
Validation loss = 0.04524531587958336
Validation loss = 0.04641196131706238
Validation loss = 0.0504530593752861
Validation loss = 0.04905177280306816
Validation loss = 0.04427246004343033
Validation loss = 0.04317561164498329
Validation loss = 0.054344065487384796
Validation loss = 0.05260954052209854
Validation loss = 0.04780755192041397
Validation loss = 0.04436579346656799
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00136 |
| Iteration     | 2        |
| MaximumReturn | -0.001   |
| MinimumReturn | -0.00161 |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08131960779428482
Validation loss = 0.04552844539284706
Validation loss = 0.03836056590080261
Validation loss = 0.04142056778073311
Validation loss = 0.03881875053048134
Validation loss = 0.04117772728204727
Validation loss = 0.03595086559653282
Validation loss = 0.03471309319138527
Validation loss = 0.034727100282907486
Validation loss = 0.03772687539458275
Validation loss = 0.03843213990330696
Validation loss = 0.037971582263708115
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08910661190748215
Validation loss = 0.06590566784143448
Validation loss = 0.05878113582730293
Validation loss = 0.05293891951441765
Validation loss = 0.05623340606689453
Validation loss = 0.04754441976547241
Validation loss = 0.05069790780544281
Validation loss = 0.04642678424715996
Validation loss = 0.04184700548648834
Validation loss = 0.04210774600505829
Validation loss = 0.038651447743177414
Validation loss = 0.043715525418519974
Validation loss = 0.04517440125346184
Validation loss = 0.04048115015029907
Validation loss = 0.04082857817411423
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04764397069811821
Validation loss = 0.05155785381793976
Validation loss = 0.04321712628006935
Validation loss = 0.04723307862877846
Validation loss = 0.051292259246110916
Validation loss = 0.043766364455223083
Validation loss = 0.048109833151102066
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07213512808084488
Validation loss = 0.060273800045251846
Validation loss = 0.04362000897526741
Validation loss = 0.04525761306285858
Validation loss = 0.04174642264842987
Validation loss = 0.041333697736263275
Validation loss = 0.04269616678357124
Validation loss = 0.04300808906555176
Validation loss = 0.04595527425408363
Validation loss = 0.04174012318253517
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0604097805917263
Validation loss = 0.04471805319190025
Validation loss = 0.053003326058387756
Validation loss = 0.04274119809269905
Validation loss = 0.03841305896639824
Validation loss = 0.03651273250579834
Validation loss = 0.036016542464494705
Validation loss = 0.039974842220544815
Validation loss = 0.03809646889567375
Validation loss = 0.0383126325905323
Validation loss = 0.03551294282078743
Validation loss = 0.038349900394678116
Validation loss = 0.04309304058551788
Validation loss = 0.03939405456185341
Validation loss = 0.036307744681835175
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.029   |
| Iteration     | 3        |
| MaximumReturn | -0.0224  |
| MinimumReturn | -0.0368  |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0447368323802948
Validation loss = 0.037798091769218445
Validation loss = 0.030438972637057304
Validation loss = 0.026665374636650085
Validation loss = 0.02463959902524948
Validation loss = 0.024132395163178444
Validation loss = 0.024795278906822205
Validation loss = 0.02469669282436371
Validation loss = 0.025086699053645134
Validation loss = 0.026793871074914932
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04012545570731163
Validation loss = 0.028769774362444878
Validation loss = 0.027515634894371033
Validation loss = 0.02708263508975506
Validation loss = 0.026028266176581383
Validation loss = 0.02965221181511879
Validation loss = 0.027979562059044838
Validation loss = 0.03358055278658867
Validation loss = 0.028382739052176476
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.041188497096300125
Validation loss = 0.029808634892106056
Validation loss = 0.03345043584704399
Validation loss = 0.027968889102339745
Validation loss = 0.026282845064997673
Validation loss = 0.027793381363153458
Validation loss = 0.031761739403009415
Validation loss = 0.028966078534722328
Validation loss = 0.028468498960137367
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.040506284683942795
Validation loss = 0.0296932402998209
Validation loss = 0.027792666107416153
Validation loss = 0.02659127488732338
Validation loss = 0.026423603296279907
Validation loss = 0.02406117133796215
Validation loss = 0.027766305953264236
Validation loss = 0.034079600125551224
Validation loss = 0.02744273655116558
Validation loss = 0.030019599944353104
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02741355262696743
Validation loss = 0.026584036648273468
Validation loss = 0.02607077546417713
Validation loss = 0.023545945063233376
Validation loss = 0.025009766221046448
Validation loss = 0.02449069358408451
Validation loss = 0.02716454304754734
Validation loss = 0.02625652216374874
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00146 |
| Iteration     | 4        |
| MaximumReturn | -0.00112 |
| MinimumReturn | -0.00191 |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03512033820152283
Validation loss = 0.027429630979895592
Validation loss = 0.025639483705163002
Validation loss = 0.024497877806425095
Validation loss = 0.02165127359330654
Validation loss = 0.03017360344529152
Validation loss = 0.026153836399316788
Validation loss = 0.030628765001893044
Validation loss = 0.022407757118344307
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.032665230333805084
Validation loss = 0.024722782894968987
Validation loss = 0.02484835870563984
Validation loss = 0.022765960544347763
Validation loss = 0.027045253664255142
Validation loss = 0.04638152942061424
Validation loss = 0.0378078855574131
Validation loss = 0.03303603455424309
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02616156078875065
Validation loss = 0.02906833216547966
Validation loss = 0.02392716147005558
Validation loss = 0.024277329444885254
Validation loss = 0.02596200443804264
Validation loss = 0.024466268718242645
Validation loss = 0.025422733277082443
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025206739082932472
Validation loss = 0.022240931168198586
Validation loss = 0.02259565144777298
Validation loss = 0.022289728745818138
Validation loss = 0.024012550711631775
Validation loss = 0.032368652522563934
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03417729586362839
Validation loss = 0.023167984560132027
Validation loss = 0.03196142986416817
Validation loss = 0.02366724982857704
Validation loss = 0.022468309849500656
Validation loss = 0.02233343943953514
Validation loss = 0.023393843322992325
Validation loss = 0.020815521478652954
Validation loss = 0.025059591978788376
Validation loss = 0.02320174127817154
Validation loss = 0.02095215953886509
Validation loss = 0.02270790934562683
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.062   |
| Iteration     | 5        |
| MaximumReturn | -0.0251  |
| MinimumReturn | -0.564   |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.034659676253795624
Validation loss = 0.027279887348413467
Validation loss = 0.027138788253068924
Validation loss = 0.026222437620162964
Validation loss = 0.02927040494978428
Validation loss = 0.02742673084139824
Validation loss = 0.026325250044465065
Validation loss = 0.024240728467702866
Validation loss = 0.023262204602360725
Validation loss = 0.02190624549984932
Validation loss = 0.030250882729887962
Validation loss = 0.025073353201150894
Validation loss = 0.026162046939134598
Validation loss = 0.020953375846147537
Validation loss = 0.023114081472158432
Validation loss = 0.02906860038638115
Validation loss = 0.023504959419369698
Validation loss = 0.023094482719898224
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.039325036108493805
Validation loss = 0.029694518074393272
Validation loss = 0.02833627536892891
Validation loss = 0.035205062478780746
Validation loss = 0.0291467122733593
Validation loss = 0.03197180852293968
Validation loss = 0.029734348878264427
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.040373533964157104
Validation loss = 0.03207135200500488
Validation loss = 0.02782961167395115
Validation loss = 0.03225821256637573
Validation loss = 0.025524158030748367
Validation loss = 0.028463711962103844
Validation loss = 0.026387646794319153
Validation loss = 0.025873705744743347
Validation loss = 0.027200883254408836
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04634518921375275
Validation loss = 0.03374698758125305
Validation loss = 0.027554992586374283
Validation loss = 0.027166809886693954
Validation loss = 0.03258760645985603
Validation loss = 0.023145589977502823
Validation loss = 0.02660464681684971
Validation loss = 0.02548266015946865
Validation loss = 0.023420967161655426
Validation loss = 0.03049764409661293
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.037413522601127625
Validation loss = 0.02780982293188572
Validation loss = 0.033896349370479584
Validation loss = 0.025119483470916748
Validation loss = 0.024790149182081223
Validation loss = 0.025026340037584305
Validation loss = 0.023299163207411766
Validation loss = 0.02912946417927742
Validation loss = 0.025583941489458084
Validation loss = 0.022345736622810364
Validation loss = 0.025851374492049217
Validation loss = 0.023407001048326492
Validation loss = 0.022533606737852097
Validation loss = 0.026138151064515114
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -17.8    |
| Iteration     | 6        |
| MaximumReturn | -1.55    |
| MinimumReturn | -40.5    |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04725198820233345
Validation loss = 0.02270577847957611
Validation loss = 0.02181197516620159
Validation loss = 0.015475341118872166
Validation loss = 0.018348507583141327
Validation loss = 0.015075686387717724
Validation loss = 0.017032669857144356
Validation loss = 0.016558576375246048
Validation loss = 0.01544844638556242
Validation loss = 0.014932460151612759
Validation loss = 0.019657490774989128
Validation loss = 0.014460086822509766
Validation loss = 0.014489189721643925
Validation loss = 0.014909453690052032
Validation loss = 0.012739628553390503
Validation loss = 0.015820054337382317
Validation loss = 0.013792642392218113
Validation loss = 0.016551485285162926
Validation loss = 0.0189136303961277
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07574077695608139
Validation loss = 0.026004277169704437
Validation loss = 0.023363418877124786
Validation loss = 0.028117269277572632
Validation loss = 0.02405223809182644
Validation loss = 0.020033597946166992
Validation loss = 0.02002054639160633
Validation loss = 0.023302020505070686
Validation loss = 0.018001731485128403
Validation loss = 0.01889260672032833
Validation loss = 0.017888637259602547
Validation loss = 0.01758662424981594
Validation loss = 0.017230071127414703
Validation loss = 0.01930657960474491
Validation loss = 0.017347905784845352
Validation loss = 0.018118182197213173
Validation loss = 0.02063070982694626
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.057651814073324203
Validation loss = 0.030336126685142517
Validation loss = 0.019077761098742485
Validation loss = 0.01889689639210701
Validation loss = 0.032043907791376114
Validation loss = 0.019981203600764275
Validation loss = 0.016833515837788582
Validation loss = 0.01885831169784069
Validation loss = 0.01828744262456894
Validation loss = 0.014778814278542995
Validation loss = 0.023223789408802986
Validation loss = 0.016379952430725098
Validation loss = 0.015045356005430222
Validation loss = 0.02172570489346981
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05988210439682007
Validation loss = 0.020836347714066505
Validation loss = 0.02281172014772892
Validation loss = 0.023068943992257118
Validation loss = 0.023879600688815117
Validation loss = 0.020221952348947525
Validation loss = 0.017280446365475655
Validation loss = 0.020837977528572083
Validation loss = 0.022823018953204155
Validation loss = 0.01666266843676567
Validation loss = 0.020887479186058044
Validation loss = 0.01586097478866577
Validation loss = 0.01407074835151434
Validation loss = 0.016863668337464333
Validation loss = 0.01902502216398716
Validation loss = 0.01723828725516796
Validation loss = 0.016163533553481102
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04450332000851631
Validation loss = 0.022911200299859047
Validation loss = 0.022617578506469727
Validation loss = 0.02606334537267685
Validation loss = 0.02427513711154461
Validation loss = 0.018773477524518967
Validation loss = 0.016088850796222687
Validation loss = 0.01545898150652647
Validation loss = 0.014531831257045269
Validation loss = 0.015217683278024197
Validation loss = 0.01641511730849743
Validation loss = 0.018535183742642403
Validation loss = 0.013639435172080994
Validation loss = 0.018152207136154175
Validation loss = 0.020488571375608444
Validation loss = 0.014170586131513119
Validation loss = 0.013738383539021015
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0312  |
| Iteration     | 7        |
| MaximumReturn | -0.0231  |
| MinimumReturn | -0.0433  |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022466447204351425
Validation loss = 0.01623356342315674
Validation loss = 0.015538953244686127
Validation loss = 0.012934939004480839
Validation loss = 0.02184235118329525
Validation loss = 0.015531343407928944
Validation loss = 0.012439646758139133
Validation loss = 0.020049719139933586
Validation loss = 0.012929909862577915
Validation loss = 0.02429141476750374
Validation loss = 0.01952773705124855
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02305436134338379
Validation loss = 0.01786748133599758
Validation loss = 0.015636930242180824
Validation loss = 0.014660893008112907
Validation loss = 0.015958186239004135
Validation loss = 0.02754584327340126
Validation loss = 0.01927836425602436
Validation loss = 0.019938752055168152
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01490686647593975
Validation loss = 0.01748756691813469
Validation loss = 0.015271313488483429
Validation loss = 0.02190050296485424
Validation loss = 0.019090685993433
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02385351061820984
Validation loss = 0.01960943453013897
Validation loss = 0.015070606954395771
Validation loss = 0.019374657422304153
Validation loss = 0.015524802729487419
Validation loss = 0.01601698435842991
Validation loss = 0.01584445871412754
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022294433787465096
Validation loss = 0.024451015517115593
Validation loss = 0.018444599583745003
Validation loss = 0.013822285458445549
Validation loss = 0.01378925796598196
Validation loss = 0.014831691980361938
Validation loss = 0.01422332413494587
Validation loss = 0.015507522039115429
Validation loss = 0.013772721402347088
Validation loss = 0.02023632451891899
Validation loss = 0.013488993979990482
Validation loss = 0.012876289896667004
Validation loss = 0.02204780839383602
Validation loss = 0.017172297462821007
Validation loss = 0.017621783539652824
Validation loss = 0.01261890958994627
Validation loss = 0.011792859062552452
Validation loss = 0.01218512374907732
Validation loss = 0.012588130310177803
Validation loss = 0.015405124984681606
Validation loss = 0.014975035563111305
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -58.9    |
| Iteration     | 8        |
| MaximumReturn | -34.3    |
| MinimumReturn | -83.1    |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.050994716584682465
Validation loss = 0.029199006035923958
Validation loss = 0.016208745539188385
Validation loss = 0.01247665099799633
Validation loss = 0.010280956514179707
Validation loss = 0.013353327289223671
Validation loss = 0.011538545601069927
Validation loss = 0.01333634089678526
Validation loss = 0.009302620775997639
Validation loss = 0.009599425829946995
Validation loss = 0.010042479261755943
Validation loss = 0.008620171807706356
Validation loss = 0.01256413385272026
Validation loss = 0.024321891367435455
Validation loss = 0.0122146587818861
Validation loss = 0.008389693684875965
Validation loss = 0.00875391997396946
Validation loss = 0.007482071407139301
Validation loss = 0.010179925709962845
Validation loss = 0.0076932283118367195
Validation loss = 0.007944488897919655
Validation loss = 0.008202657103538513
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04250380024313927
Validation loss = 0.015936046838760376
Validation loss = 0.015188561752438545
Validation loss = 0.013562439009547234
Validation loss = 0.010743698105216026
Validation loss = 0.010559799149632454
Validation loss = 0.011334771290421486
Validation loss = 0.009977804496884346
Validation loss = 0.008083224296569824
Validation loss = 0.012600140646100044
Validation loss = 0.007346434518694878
Validation loss = 0.007559227757155895
Validation loss = 0.009024154394865036
Validation loss = 0.008969446644186974
Validation loss = 0.010568936355412006
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05640420317649841
Validation loss = 0.02003014087677002
Validation loss = 0.01194931659847498
Validation loss = 0.013233509846031666
Validation loss = 0.018388595432043076
Validation loss = 0.013404322788119316
Validation loss = 0.010659932158887386
Validation loss = 0.01700550690293312
Validation loss = 0.010480106808245182
Validation loss = 0.01191356685012579
Validation loss = 0.010010335594415665
Validation loss = 0.012376477010548115
Validation loss = 0.012041554786264896
Validation loss = 0.010967335663735867
Validation loss = 0.010008459910750389
Validation loss = 0.008977006189525127
Validation loss = 0.01002245582640171
Validation loss = 0.00971760880202055
Validation loss = 0.0099188182502985
Validation loss = 0.007599587552249432
Validation loss = 0.010511616244912148
Validation loss = 0.00946944672614336
Validation loss = 0.008162274956703186
Validation loss = 0.011081370525062084
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.045529577881097794
Validation loss = 0.01899069920182228
Validation loss = 0.016890576109290123
Validation loss = 0.013384364545345306
Validation loss = 0.016387587413191795
Validation loss = 0.01473536528646946
Validation loss = 0.01374216377735138
Validation loss = 0.013074743561446667
Validation loss = 0.009806799702346325
Validation loss = 0.013467688113451004
Validation loss = 0.01313275471329689
Validation loss = 0.017126914113759995
Validation loss = 0.009343598037958145
Validation loss = 0.008981096558272839
Validation loss = 0.01053132675588131
Validation loss = 0.007881293073296547
Validation loss = 0.00895262137055397
Validation loss = 0.00882677547633648
Validation loss = 0.009803635999560356
Validation loss = 0.0167790986597538
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05516901984810829
Validation loss = 0.019087843596935272
Validation loss = 0.019979387521743774
Validation loss = 0.02016059309244156
Validation loss = 0.012140624225139618
Validation loss = 0.015214310958981514
Validation loss = 0.010860713198781013
Validation loss = 0.011785668320953846
Validation loss = 0.011286825872957706
Validation loss = 0.008820329792797565
Validation loss = 0.009674297645688057
Validation loss = 0.019747788086533546
Validation loss = 0.009423363022506237
Validation loss = 0.009065625257790089
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.05    |
| Iteration     | 9        |
| MaximumReturn | -0.0263  |
| MinimumReturn | -0.112   |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009511377662420273
Validation loss = 0.006700894795358181
Validation loss = 0.008285973221063614
Validation loss = 0.006918658036738634
Validation loss = 0.006256823893636465
Validation loss = 0.006462219636887312
Validation loss = 0.010527494363486767
Validation loss = 0.0072322082705795765
Validation loss = 0.00659913569688797
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011098476126790047
Validation loss = 0.009073875844478607
Validation loss = 0.00765979615971446
Validation loss = 0.00806278083473444
Validation loss = 0.0063614086247980595
Validation loss = 0.007349024992436171
Validation loss = 0.007664784789085388
Validation loss = 0.006395095493644476
Validation loss = 0.007961487397551537
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010228465311229229
Validation loss = 0.012479063123464584
Validation loss = 0.008425808511674404
Validation loss = 0.009207585826516151
Validation loss = 0.0070472476072609425
Validation loss = 0.009934168308973312
Validation loss = 0.006794369779527187
Validation loss = 0.0063405716791749
Validation loss = 0.007590864319354296
Validation loss = 0.011239424347877502
Validation loss = 0.014666249044239521
Validation loss = 0.010308577679097652
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01382687408477068
Validation loss = 0.00916043296456337
Validation loss = 0.00886900257319212
Validation loss = 0.009004655294120312
Validation loss = 0.012887248769402504
Validation loss = 0.0065787979401648045
Validation loss = 0.010887553915381432
Validation loss = 0.009808077476918697
Validation loss = 0.007772465702146292
Validation loss = 0.006259497720748186
Validation loss = 0.00861772708594799
Validation loss = 0.00780586339533329
Validation loss = 0.00866906438022852
Validation loss = 0.00670502707362175
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00890680868178606
Validation loss = 0.006289992481470108
Validation loss = 0.006432087626308203
Validation loss = 0.011384215205907822
Validation loss = 0.00728986831381917
Validation loss = 0.007869075983762741
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.225   |
| Iteration     | 10       |
| MaximumReturn | -0.0304  |
| MinimumReturn | -0.822   |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009851190261542797
Validation loss = 0.0076043494045734406
Validation loss = 0.006753835827112198
Validation loss = 0.007380808237940073
Validation loss = 0.007953973487019539
Validation loss = 0.009099501185119152
Validation loss = 0.010538225993514061
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014443124644458294
Validation loss = 0.00965926144272089
Validation loss = 0.008768231607973576
Validation loss = 0.007743481546640396
Validation loss = 0.008177509531378746
Validation loss = 0.01071409322321415
Validation loss = 0.007236789911985397
Validation loss = 0.0064859539270401
Validation loss = 0.007943528704345226
Validation loss = 0.007404687814414501
Validation loss = 0.009602761827409267
Validation loss = 0.007936221547424793
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013165106996893883
Validation loss = 0.00965708028525114
Validation loss = 0.008975584991276264
Validation loss = 0.007301724515855312
Validation loss = 0.009350752457976341
Validation loss = 0.009726868942379951
Validation loss = 0.0102545116096735
Validation loss = 0.00764248613268137
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013269903138279915
Validation loss = 0.008008093573153019
Validation loss = 0.010852435603737831
Validation loss = 0.0072266338393092155
Validation loss = 0.007731931749731302
Validation loss = 0.007656393107026815
Validation loss = 0.008267199620604515
Validation loss = 0.0074904924258589745
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009634843096137047
Validation loss = 0.008917656727135181
Validation loss = 0.010320669040083885
Validation loss = 0.006792409811168909
Validation loss = 0.0057084569707512856
Validation loss = 0.0069503458216786385
Validation loss = 0.011275306344032288
Validation loss = 0.011263003572821617
Validation loss = 0.007028855383396149
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.219   |
| Iteration     | 11       |
| MaximumReturn | -0.0514  |
| MinimumReturn | -1.58    |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010485817678272724
Validation loss = 0.008631950244307518
Validation loss = 0.007570543792098761
Validation loss = 0.00992791261523962
Validation loss = 0.010790273547172546
Validation loss = 0.007101885974407196
Validation loss = 0.00964218471199274
Validation loss = 0.010450344532728195
Validation loss = 0.007438591681420803
Validation loss = 0.007092990912497044
Validation loss = 0.008319966495037079
Validation loss = 0.011107141152024269
Validation loss = 0.008344488218426704
Validation loss = 0.007186423055827618
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010317335836589336
Validation loss = 0.009554010815918446
Validation loss = 0.008026503957808018
Validation loss = 0.00663641607388854
Validation loss = 0.008498793467879295
Validation loss = 0.008019545115530491
Validation loss = 0.0072485110722482204
Validation loss = 0.006108204834163189
Validation loss = 0.007303805090487003
Validation loss = 0.007226686924695969
Validation loss = 0.007214043289422989
Validation loss = 0.012118641287088394
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010499408468604088
Validation loss = 0.007034112699329853
Validation loss = 0.0114717623218894
Validation loss = 0.007085338234901428
Validation loss = 0.013521058484911919
Validation loss = 0.00866217352449894
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022612344473600388
Validation loss = 0.017603974789381027
Validation loss = 0.00980598758906126
Validation loss = 0.007325740065425634
Validation loss = 0.006157571915537119
Validation loss = 0.009090179577469826
Validation loss = 0.009818610735237598
Validation loss = 0.009749865159392357
Validation loss = 0.0077333003282547
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011285623535513878
Validation loss = 0.0088755302131176
Validation loss = 0.0077551594004035
Validation loss = 0.011869129724800587
Validation loss = 0.0067864032462239265
Validation loss = 0.006813903339207172
Validation loss = 0.008613741025328636
Validation loss = 0.008921114727854729
Validation loss = 0.009542623534798622
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0643  |
| Iteration     | 12       |
| MaximumReturn | -0.034   |
| MinimumReturn | -0.0986  |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006365894805639982
Validation loss = 0.009652500972151756
Validation loss = 0.005287307780236006
Validation loss = 0.006854162085801363
Validation loss = 0.007829039357602596
Validation loss = 0.006113242823630571
Validation loss = 0.006706905085593462
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011303030885756016
Validation loss = 0.010268967598676682
Validation loss = 0.006027206312865019
Validation loss = 0.00839943066239357
Validation loss = 0.006306946277618408
Validation loss = 0.006465298123657703
Validation loss = 0.007734788581728935
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007686704862862825
Validation loss = 0.006453476380556822
Validation loss = 0.007197592873126268
Validation loss = 0.007030717562884092
Validation loss = 0.006085800938308239
Validation loss = 0.008893332444131374
Validation loss = 0.008052544668316841
Validation loss = 0.0076780663803219795
Validation loss = 0.012991037219762802
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013052054680883884
Validation loss = 0.010469921864569187
Validation loss = 0.007289391476660967
Validation loss = 0.006639769766479731
Validation loss = 0.007271504495292902
Validation loss = 0.006265661213546991
Validation loss = 0.00621591042727232
Validation loss = 0.005881513934582472
Validation loss = 0.010420477949082851
Validation loss = 0.01103988941758871
Validation loss = 0.009385366924107075
Validation loss = 0.005215198267251253
Validation loss = 0.0065965731628239155
Validation loss = 0.008634529076516628
Validation loss = 0.01114871259778738
Validation loss = 0.005593712441623211
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016303105279803276
Validation loss = 0.006455510854721069
Validation loss = 0.0183473601937294
Validation loss = 0.007884478196501732
Validation loss = 0.006841867696493864
Validation loss = 0.006165529601275921
Validation loss = 0.007154978346079588
Validation loss = 0.010854925960302353
Validation loss = 0.00964899268001318
Validation loss = 0.006262386217713356
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00232 |
| Iteration     | 13       |
| MaximumReturn | -0.00159 |
| MinimumReturn | -0.00299 |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009791320189833641
Validation loss = 0.006368899252265692
Validation loss = 0.006163421552628279
Validation loss = 0.006893451791256666
Validation loss = 0.006155455019325018
Validation loss = 0.014405393041670322
Validation loss = 0.012290396727621555
Validation loss = 0.005960734095424414
Validation loss = 0.006225446239113808
Validation loss = 0.006394784431904554
Validation loss = 0.005684263538569212
Validation loss = 0.011765304021537304
Validation loss = 0.00532051594927907
Validation loss = 0.0053574033081531525
Validation loss = 0.006569806952029467
Validation loss = 0.0073170531541109085
Validation loss = 0.009522947482764721
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007720181718468666
Validation loss = 0.007169201970100403
Validation loss = 0.006349302362650633
Validation loss = 0.006167106796056032
Validation loss = 0.006083877757191658
Validation loss = 0.006220193579792976
Validation loss = 0.004961604252457619
Validation loss = 0.005389653146266937
Validation loss = 0.006117239594459534
Validation loss = 0.009265400469303131
Validation loss = 0.009640512987971306
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014096449129283428
Validation loss = 0.0075272186659276485
Validation loss = 0.00734051875770092
Validation loss = 0.007291380316019058
Validation loss = 0.006152847781777382
Validation loss = 0.010382536798715591
Validation loss = 0.007239911705255508
Validation loss = 0.005941243376582861
Validation loss = 0.008631688542664051
Validation loss = 0.008737907744944096
Validation loss = 0.009041978977620602
Validation loss = 0.0077725425362586975
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0072776577435433865
Validation loss = 0.005878055933862925
Validation loss = 0.006386447232216597
Validation loss = 0.007470156531780958
Validation loss = 0.008641746826469898
Validation loss = 0.0072041768580675125
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010840603150427341
Validation loss = 0.005980402696877718
Validation loss = 0.006440069526433945
Validation loss = 0.010728221386671066
Validation loss = 0.006694479379802942
Validation loss = 0.006202623248100281
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00882 |
| Iteration     | 14       |
| MaximumReturn | -0.00504 |
| MinimumReturn | -0.011   |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010274880565702915
Validation loss = 0.0059576500207185745
Validation loss = 0.005511869676411152
Validation loss = 0.006853653118014336
Validation loss = 0.005886666942387819
Validation loss = 0.006468485575169325
Validation loss = 0.006459691561758518
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009728274308145046
Validation loss = 0.007785834837704897
Validation loss = 0.006118727847933769
Validation loss = 0.0061590829864144325
Validation loss = 0.0049382043071091175
Validation loss = 0.015855927020311356
Validation loss = 0.008616232313215733
Validation loss = 0.006665789056569338
Validation loss = 0.006863602437078953
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01189026702195406
Validation loss = 0.010312543250620365
Validation loss = 0.011218307539820671
Validation loss = 0.005578808020800352
Validation loss = 0.006806552410125732
Validation loss = 0.009611845947802067
Validation loss = 0.011564075015485287
Validation loss = 0.006161024793982506
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012957703322172165
Validation loss = 0.006576402112841606
Validation loss = 0.006065303925424814
Validation loss = 0.005696224980056286
Validation loss = 0.006741831079125404
Validation loss = 0.005674074869602919
Validation loss = 0.009890839457511902
Validation loss = 0.012750565074384212
Validation loss = 0.0056429957039654255
Validation loss = 0.005819054786115885
Validation loss = 0.005943550728261471
Validation loss = 0.005451097618788481
Validation loss = 0.007047862745821476
Validation loss = 0.006707014515995979
Validation loss = 0.005402194336056709
Validation loss = 0.011374559253454208
Validation loss = 0.005102578084915876
Validation loss = 0.005334861576557159
Validation loss = 0.004812913481146097
Validation loss = 0.00812336802482605
Validation loss = 0.006767422426491976
Validation loss = 0.007757388520985842
Validation loss = 0.00598006509244442
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010640982538461685
Validation loss = 0.007828792557120323
Validation loss = 0.008763511665165424
Validation loss = 0.007371755316853523
Validation loss = 0.007066430523991585
Validation loss = 0.008170477114617825
Validation loss = 0.011693140491843224
Validation loss = 0.005744148977100849
Validation loss = 0.007857371121644974
Validation loss = 0.005902898497879505
Validation loss = 0.005927314516156912
Validation loss = 0.0056309145875275135
Validation loss = 0.007562719751149416
Validation loss = 0.00733936345204711
Validation loss = 0.007178557571023703
Validation loss = 0.005576979834586382
Validation loss = 0.005726414266973734
Validation loss = 0.005690174642950296
Validation loss = 0.006998529192060232
Validation loss = 0.005044784862548113
Validation loss = 0.006209587678313255
Validation loss = 0.005238131619989872
Validation loss = 0.007097845897078514
Validation loss = 0.005022404249757528
Validation loss = 0.00613007415086031
Validation loss = 0.0055822329595685005
Validation loss = 0.005759851075708866
Validation loss = 0.01771344244480133
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000883 |
| Iteration     | 15        |
| MaximumReturn | -0.00059  |
| MinimumReturn | -0.00106  |
| TotalSamples  | 28322     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011358963325619698
Validation loss = 0.005746088456362486
Validation loss = 0.005604246165603399
Validation loss = 0.006145377643406391
Validation loss = 0.005717211868613958
Validation loss = 0.010538958013057709
Validation loss = 0.010499343276023865
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007742710877209902
Validation loss = 0.006511371117085218
Validation loss = 0.005683995317667723
Validation loss = 0.008247767575085163
Validation loss = 0.0073582264594733715
Validation loss = 0.008308125659823418
Validation loss = 0.004768311977386475
Validation loss = 0.006269165780395269
Validation loss = 0.010047510266304016
Validation loss = 0.006349284667521715
Validation loss = 0.005126708187162876
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0102390730753541
Validation loss = 0.014133980497717857
Validation loss = 0.006448458880186081
Validation loss = 0.005454984959214926
Validation loss = 0.009688662365078926
Validation loss = 0.0055690030567348
Validation loss = 0.006129097193479538
Validation loss = 0.006110865157097578
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00578603008762002
Validation loss = 0.006787829101085663
Validation loss = 0.005321459379047155
Validation loss = 0.01062716729938984
Validation loss = 0.00835609994828701
Validation loss = 0.004433503374457359
Validation loss = 0.006637050770223141
Validation loss = 0.013519632630050182
Validation loss = 0.008186996914446354
Validation loss = 0.0072962818667292595
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011581902392208576
Validation loss = 0.010201343335211277
Validation loss = 0.005485887639224529
Validation loss = 0.006884866859763861
Validation loss = 0.006329100579023361
Validation loss = 0.007748426403850317
Validation loss = 0.005606784950941801
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00228  |
| Iteration     | 16        |
| MaximumReturn | -0.000876 |
| MinimumReturn | -0.0065   |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010092913173139095
Validation loss = 0.0035941300448030233
Validation loss = 0.0034004063345491886
Validation loss = 0.002922940067946911
Validation loss = 0.003139519365504384
Validation loss = 0.003590378677472472
Validation loss = 0.003402660833671689
Validation loss = 0.003366441698744893
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00719593558460474
Validation loss = 0.0031498840544372797
Validation loss = 0.0030566207133233547
Validation loss = 0.003957933280616999
Validation loss = 0.003645123215392232
Validation loss = 0.004140328150242567
Validation loss = 0.003176693571731448
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007172721438109875
Validation loss = 0.0032442105002701283
Validation loss = 0.003339119954034686
Validation loss = 0.0030484122689813375
Validation loss = 0.003267949679866433
Validation loss = 0.0041996692307293415
Validation loss = 0.0040763975121080875
Validation loss = 0.004033880773931742
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010808233171701431
Validation loss = 0.0036881628911942244
Validation loss = 0.003143620677292347
Validation loss = 0.0028092849534004927
Validation loss = 0.0031851637177169323
Validation loss = 0.0032579738181084394
Validation loss = 0.003890922525897622
Validation loss = 0.0035165685694664717
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006096281576901674
Validation loss = 0.0029856031760573387
Validation loss = 0.0032224201131612062
Validation loss = 0.0028875949792563915
Validation loss = 0.003351901425048709
Validation loss = 0.0032949172891676426
Validation loss = 0.003800422651693225
Validation loss = 0.0036880909465253353
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0255  |
| Iteration     | 17       |
| MaximumReturn | -0.0163  |
| MinimumReturn | -0.0344  |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008247900754213333
Validation loss = 0.0026217293925583363
Validation loss = 0.0024433466605842113
Validation loss = 0.00278879189863801
Validation loss = 0.0025108964182436466
Validation loss = 0.002438613213598728
Validation loss = 0.002590194111689925
Validation loss = 0.0023172609508037567
Validation loss = 0.0051500131376087666
Validation loss = 0.0021944998297840357
Validation loss = 0.0032093441113829613
Validation loss = 0.0028791502118110657
Validation loss = 0.002931747119873762
Validation loss = 0.002443137811496854
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006301263812929392
Validation loss = 0.0030693525914102793
Validation loss = 0.003485199296846986
Validation loss = 0.0024608024396002293
Validation loss = 0.0025277261156588793
Validation loss = 0.002452343702316284
Validation loss = 0.0024597658775746822
Validation loss = 0.002757807495072484
Validation loss = 0.0022648973390460014
Validation loss = 0.003550104098394513
Validation loss = 0.004171826411038637
Validation loss = 0.0026249224320054054
Validation loss = 0.002662228886038065
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007174265570938587
Validation loss = 0.0036442342679947615
Validation loss = 0.0029600425623357296
Validation loss = 0.002600750420242548
Validation loss = 0.0027004233561456203
Validation loss = 0.0032508980948477983
Validation loss = 0.002202094765380025
Validation loss = 0.002571199322119355
Validation loss = 0.003291964763775468
Validation loss = 0.002656405558809638
Validation loss = 0.003261096077039838
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006513325497508049
Validation loss = 0.002550058998167515
Validation loss = 0.0022246348671615124
Validation loss = 0.002696127165108919
Validation loss = 0.002692289650440216
Validation loss = 0.0023226472549140453
Validation loss = 0.003067109966650605
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005984981544315815
Validation loss = 0.002432363573461771
Validation loss = 0.0023574039805680513
Validation loss = 0.002315451158210635
Validation loss = 0.0021512547973543406
Validation loss = 0.0022587839048355818
Validation loss = 0.00231551774777472
Validation loss = 0.0020801087375730276
Validation loss = 0.0022534322924911976
Validation loss = 0.0026093197520822287
Validation loss = 0.0033990638330578804
Validation loss = 0.0036159707233309746
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.63    |
| Iteration     | 18       |
| MaximumReturn | -0.0392  |
| MinimumReturn | -22.4    |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003368689678609371
Validation loss = 0.0024452796205878258
Validation loss = 0.0033273249864578247
Validation loss = 0.002913626842200756
Validation loss = 0.002279900945723057
Validation loss = 0.0034408641513437033
Validation loss = 0.0024086053017526865
Validation loss = 0.0025003752671182156
Validation loss = 0.003478097729384899
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0028657610528171062
Validation loss = 0.002575908787548542
Validation loss = 0.003570845816284418
Validation loss = 0.0025094049051404
Validation loss = 0.0036184880882501602
Validation loss = 0.0032343962229788303
Validation loss = 0.0022489060647785664
Validation loss = 0.0030494024977087975
Validation loss = 0.003301276359707117
Validation loss = 0.0032635265961289406
Validation loss = 0.0025898567400872707
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004496878944337368
Validation loss = 0.005149093456566334
Validation loss = 0.002618909114971757
Validation loss = 0.0028557113837450743
Validation loss = 0.0026303783524781466
Validation loss = 0.003471979172900319
Validation loss = 0.0026744252536445856
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002579479245468974
Validation loss = 0.0021484673488885164
Validation loss = 0.0035726898349821568
Validation loss = 0.003076357301324606
Validation loss = 0.0021715813782066107
Validation loss = 0.0027798449154943228
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0028159942012280226
Validation loss = 0.0024705370888113976
Validation loss = 0.0022848027292639017
Validation loss = 0.002868928713724017
Validation loss = 0.002509209793061018
Validation loss = 0.004046194721013308
Validation loss = 0.00291266106069088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00691 |
| Iteration     | 19       |
| MaximumReturn | -0.00158 |
| MinimumReturn | -0.0185  |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0033157547004520893
Validation loss = 0.0025793625973165035
Validation loss = 0.0026531985495239496
Validation loss = 0.0027066953480243683
Validation loss = 0.002555236453190446
Validation loss = 0.0025372644886374474
Validation loss = 0.0023971362970769405
Validation loss = 0.0032906022388488054
Validation loss = 0.002272039884701371
Validation loss = 0.003301019547507167
Validation loss = 0.004002809524536133
Validation loss = 0.0022815419360995293
Validation loss = 0.0021751713939011097
Validation loss = 0.0026052112225443125
Validation loss = 0.0027682834770530462
Validation loss = 0.0026963131967931986
Validation loss = 0.0029471348971128464
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0028900718316435814
Validation loss = 0.0026164334267377853
Validation loss = 0.0026569897308945656
Validation loss = 0.00440942170098424
Validation loss = 0.002232859842479229
Validation loss = 0.0032009005080908537
Validation loss = 0.002994840731844306
Validation loss = 0.0027069251518696547
Validation loss = 0.002395598217844963
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003553542075678706
Validation loss = 0.0038016950711607933
Validation loss = 0.005211283918470144
Validation loss = 0.003434442915022373
Validation loss = 0.0029626560863107443
Validation loss = 0.0023349039256572723
Validation loss = 0.0023444986436516047
Validation loss = 0.0026866705156862736
Validation loss = 0.0035859288182109594
Validation loss = 0.0034721875563263893
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0034864225890487432
Validation loss = 0.0027590824756771326
Validation loss = 0.002373110270127654
Validation loss = 0.002217179862782359
Validation loss = 0.0024999298620969057
Validation loss = 0.005207705311477184
Validation loss = 0.0039153508841991425
Validation loss = 0.004132579080760479
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002784551354125142
Validation loss = 0.003204451408237219
Validation loss = 0.002408496104180813
Validation loss = 0.0029719306621700525
Validation loss = 0.0024791972246021032
Validation loss = 0.003742000088095665
Validation loss = 0.002826305339112878
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.055   |
| Iteration     | 20       |
| MaximumReturn | -0.0298  |
| MinimumReturn | -0.0835  |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0031825548503547907
Validation loss = 0.004085347522050142
Validation loss = 0.0024458812549710274
Validation loss = 0.0024092127569019794
Validation loss = 0.002929977374151349
Validation loss = 0.002935516880825162
Validation loss = 0.002927804831415415
Validation loss = 0.0026252381503582
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0028112302534282207
Validation loss = 0.003739752806723118
Validation loss = 0.0027753738686442375
Validation loss = 0.005822044797241688
Validation loss = 0.0023459503427147865
Validation loss = 0.0028004744090139866
Validation loss = 0.0020886638667434454
Validation loss = 0.002551761455833912
Validation loss = 0.003308437531813979
Validation loss = 0.0027151191607117653
Validation loss = 0.0029174182564020157
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0026118275709450245
Validation loss = 0.002796484623104334
Validation loss = 0.0030044240411370993
Validation loss = 0.0029122037813067436
Validation loss = 0.0024692537263035774
Validation loss = 0.0037081304471939802
Validation loss = 0.003761829575523734
Validation loss = 0.003040757030248642
Validation loss = 0.0032042479142546654
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002674123039469123
Validation loss = 0.003322458127513528
Validation loss = 0.0022778769489377737
Validation loss = 0.0034366147592663765
Validation loss = 0.0030603650957345963
Validation loss = 0.002457992872223258
Validation loss = 0.0023418571799993515
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0021661119535565376
Validation loss = 0.002554028294980526
Validation loss = 0.003456301521509886
Validation loss = 0.0030138948932290077
Validation loss = 0.0022420217283070087
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00769 |
| Iteration     | 21       |
| MaximumReturn | -0.0051  |
| MinimumReturn | -0.0116  |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0033546381164342165
Validation loss = 0.0023002149537205696
Validation loss = 0.0024993265978991985
Validation loss = 0.0031756125390529633
Validation loss = 0.002912320662289858
Validation loss = 0.003575962968170643
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0022846392821520567
Validation loss = 0.0032718025613576174
Validation loss = 0.0034956778399646282
Validation loss = 0.0024330795276910067
Validation loss = 0.004174483474344015
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004006829112768173
Validation loss = 0.0036121078301221132
Validation loss = 0.0035729019436985254
Validation loss = 0.0029870790895074606
Validation loss = 0.004282011184841394
Validation loss = 0.002633238211274147
Validation loss = 0.002845039591193199
Validation loss = 0.0034469927195459604
Validation loss = 0.0025675075594335794
Validation loss = 0.0027438350953161716
Validation loss = 0.0041275182738900185
Validation loss = 0.0025751616340130568
Validation loss = 0.005534209776669741
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0030748366843909025
Validation loss = 0.002201158320531249
Validation loss = 0.002847877098247409
Validation loss = 0.002023388398811221
Validation loss = 0.00338811706751585
Validation loss = 0.002241201465949416
Validation loss = 0.003397864755243063
Validation loss = 0.0035244650207459927
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0031258733943104744
Validation loss = 0.0020566973835229874
Validation loss = 0.0042701768688857555
Validation loss = 0.002627636305987835
Validation loss = 0.002406567567959428
Validation loss = 0.004228602629154921
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0778  |
| Iteration     | 22       |
| MaximumReturn | -0.0564  |
| MinimumReturn | -0.126   |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002615202683955431
Validation loss = 0.0024123643524944782
Validation loss = 0.0021002627909183502
Validation loss = 0.003284392412751913
Validation loss = 0.0036845956929028034
Validation loss = 0.0028555463068187237
Validation loss = 0.0019058343023061752
Validation loss = 0.003231130074709654
Validation loss = 0.003293044166639447
Validation loss = 0.0021049079950898886
Validation loss = 0.0022860881872475147
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004129025153815746
Validation loss = 0.002860780106857419
Validation loss = 0.002251789905130863
Validation loss = 0.0023991938214749098
Validation loss = 0.0021811765618622303
Validation loss = 0.002736115362495184
Validation loss = 0.0030710422433912754
Validation loss = 0.002708426443859935
Validation loss = 0.0022672354243695736
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0028089755214750767
Validation loss = 0.0032886334229260683
Validation loss = 0.001878493232652545
Validation loss = 0.0020352848805487156
Validation loss = 0.0031301802955567837
Validation loss = 0.0027010750491172075
Validation loss = 0.003271243767812848
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0022307471372187138
Validation loss = 0.002392314374446869
Validation loss = 0.002149498788639903
Validation loss = 0.002468098420649767
Validation loss = 0.002464177319779992
Validation loss = 0.002292241435497999
Validation loss = 0.0020182915031909943
Validation loss = 0.0023957521189004183
Validation loss = 0.005142167676240206
Validation loss = 0.0027794952038675547
Validation loss = 0.0033230092376470566
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004511895589530468
Validation loss = 0.0022565617691725492
Validation loss = 0.0023034331388771534
Validation loss = 0.0018893980886787176
Validation loss = 0.0022721528075635433
Validation loss = 0.0028170268051326275
Validation loss = 0.0022618104703724384
Validation loss = 0.004408998880535364
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0974  |
| Iteration     | 23       |
| MaximumReturn | -0.0206  |
| MinimumReturn | -0.784   |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0032787465024739504
Validation loss = 0.0027539401780813932
Validation loss = 0.002189004560932517
Validation loss = 0.002463785232976079
Validation loss = 0.002861484419554472
Validation loss = 0.0018683753442019224
Validation loss = 0.0026530311442911625
Validation loss = 0.0028922357596457005
Validation loss = 0.0038722988683730364
Validation loss = 0.002671984024345875
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002378778299316764
Validation loss = 0.002751896157860756
Validation loss = 0.0034781042486429214
Validation loss = 0.002438692143186927
Validation loss = 0.003919581882655621
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003185536013916135
Validation loss = 0.005147813819348812
Validation loss = 0.003118698950856924
Validation loss = 0.0028542482759803534
Validation loss = 0.0022923368960618973
Validation loss = 0.00365910935215652
Validation loss = 0.0028468570671975613
Validation loss = 0.002833395032212138
Validation loss = 0.0023564831353724003
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0043816640973091125
Validation loss = 0.002994757844135165
Validation loss = 0.0023055800702422857
Validation loss = 0.002516638021916151
Validation loss = 0.0026250313967466354
Validation loss = 0.002602095017209649
Validation loss = 0.0032304550986737013
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0026572132483124733
Validation loss = 0.00223988713696599
Validation loss = 0.00246208137832582
Validation loss = 0.004133797250688076
Validation loss = 0.0021055550314486027
Validation loss = 0.002133067697286606
Validation loss = 0.002482424955815077
Validation loss = 0.0025887778028845787
Validation loss = 0.002454084111377597
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0394  |
| Iteration     | 24       |
| MaximumReturn | -0.0226  |
| MinimumReturn | -0.137   |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00260358233936131
Validation loss = 0.002525411080569029
Validation loss = 0.0029377907048910856
Validation loss = 0.0025940663181245327
Validation loss = 0.00262602511793375
Validation loss = 0.002399338409304619
Validation loss = 0.002153765643015504
Validation loss = 0.0030385474674403667
Validation loss = 0.0020553958602249622
Validation loss = 0.0036679052282124758
Validation loss = 0.0026260719168931246
Validation loss = 0.0025559822097420692
Validation loss = 0.0023010950535535812
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003672173246741295
Validation loss = 0.00318457861430943
Validation loss = 0.002423784462735057
Validation loss = 0.002420601900666952
Validation loss = 0.0022001881152391434
Validation loss = 0.002688869135454297
Validation loss = 0.002446869621053338
Validation loss = 0.0020028152503073215
Validation loss = 0.002574290381744504
Validation loss = 0.0029296192806214094
Validation loss = 0.002397056668996811
Validation loss = 0.003104193601757288
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024214338045567274
Validation loss = 0.0020751086995005608
Validation loss = 0.0025933708529919386
Validation loss = 0.00359160709194839
Validation loss = 0.0026866146363317966
Validation loss = 0.0034794292878359556
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0030920598655939102
Validation loss = 0.002755163935944438
Validation loss = 0.0033567873761057854
Validation loss = 0.0023836507461965084
Validation loss = 0.002133157104253769
Validation loss = 0.002242838963866234
Validation loss = 0.0020483885891735554
Validation loss = 0.0027219208423048258
Validation loss = 0.002902324078604579
Validation loss = 0.0029075322672724724
Validation loss = 0.002148542320355773
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003393517341464758
Validation loss = 0.0021089932415634394
Validation loss = 0.002941480604931712
Validation loss = 0.0022862288169562817
Validation loss = 0.002489939797669649
Validation loss = 0.0028103250078856945
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -41.8    |
| Iteration     | 25       |
| MaximumReturn | -0.47    |
| MinimumReturn | -74.2    |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002455072710290551
Validation loss = 0.0020756663288921118
Validation loss = 0.002997043775394559
Validation loss = 0.0017700542230159044
Validation loss = 0.002145955804735422
Validation loss = 0.0021100628655403852
Validation loss = 0.0018603266216814518
Validation loss = 0.00305966311134398
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0038018778432160616
Validation loss = 0.002087482949718833
Validation loss = 0.0026956587098538876
Validation loss = 0.002086829859763384
Validation loss = 0.0030013753566890955
Validation loss = 0.0028652928303927183
Validation loss = 0.0022865424398332834
Validation loss = 0.0027436716482043266
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0036825030110776424
Validation loss = 0.0021132314577698708
Validation loss = 0.0033158569131046534
Validation loss = 0.0023116825614124537
Validation loss = 0.0025737814139574766
Validation loss = 0.002043419051915407
Validation loss = 0.002437338000163436
Validation loss = 0.0024718029890209436
Validation loss = 0.0022713651414960623
Validation loss = 0.00288263987749815
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004300737753510475
Validation loss = 0.002254084451124072
Validation loss = 0.0021144840866327286
Validation loss = 0.001985840266570449
Validation loss = 0.002566719427704811
Validation loss = 0.001942077768035233
Validation loss = 0.0023491918109357357
Validation loss = 0.002848374657332897
Validation loss = 0.0024547611828893423
Validation loss = 0.0029877624474465847
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004252290818840265
Validation loss = 0.0036163912154734135
Validation loss = 0.002228248631581664
Validation loss = 0.002058484125882387
Validation loss = 0.0017980248667299747
Validation loss = 0.001744493842124939
Validation loss = 0.0020660015288740396
Validation loss = 0.003090642625465989
Validation loss = 0.0021130689419806004
Validation loss = 0.0018513883696869016
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0707  |
| Iteration     | 26       |
| MaximumReturn | -0.0349  |
| MinimumReturn | -0.151   |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0024495073594152927
Validation loss = 0.0020418851636350155
Validation loss = 0.0019756529945880175
Validation loss = 0.001942502218298614
Validation loss = 0.002126468112692237
Validation loss = 0.003503900719806552
Validation loss = 0.0019887795206159353
Validation loss = 0.0018327936995774508
Validation loss = 0.0023375956807285547
Validation loss = 0.00270277401432395
Validation loss = 0.0020931237377226353
Validation loss = 0.0020274415146559477
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027198984753340483
Validation loss = 0.0021383832208812237
Validation loss = 0.0024645160883665085
Validation loss = 0.0027474539820104837
Validation loss = 0.0018252218142151833
Validation loss = 0.002651047892868519
Validation loss = 0.0025153765454888344
Validation loss = 0.0024697924964129925
Validation loss = 0.002611988689750433
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022153612226247787
Validation loss = 0.001984584843739867
Validation loss = 0.0024122230242937803
Validation loss = 0.002024369779974222
Validation loss = 0.0018873446388170123
Validation loss = 0.0021144538186490536
Validation loss = 0.003221130231395364
Validation loss = 0.001967109739780426
Validation loss = 0.002133144997060299
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002685271669179201
Validation loss = 0.003200690494850278
Validation loss = 0.002718876348808408
Validation loss = 0.0033863631542772055
Validation loss = 0.002442194614559412
Validation loss = 0.002835812047123909
Validation loss = 0.0024204032961279154
Validation loss = 0.0027699025813490152
Validation loss = 0.002315347548574209
Validation loss = 0.0020745242945849895
Validation loss = 0.0035952136386185884
Validation loss = 0.0024892843794077635
Validation loss = 0.002100285841152072
Validation loss = 0.0021208508405834436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023196700494736433
Validation loss = 0.0022936053574085236
Validation loss = 0.0019731654319912195
Validation loss = 0.001799236866645515
Validation loss = 0.0023796854075044394
Validation loss = 0.002215778222307563
Validation loss = 0.002133393893018365
Validation loss = 0.001994707388803363
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0456  |
| Iteration     | 27       |
| MaximumReturn | -0.0334  |
| MinimumReturn | -0.0892  |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002681781305000186
Validation loss = 0.002142866374924779
Validation loss = 0.002878152532503009
Validation loss = 0.0020370592828840017
Validation loss = 0.001944752992130816
Validation loss = 0.0021040348801761866
Validation loss = 0.002800174755975604
Validation loss = 0.002024789107963443
Validation loss = 0.0021853025536984205
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0022436080034822226
Validation loss = 0.00239387690089643
Validation loss = 0.00413255300372839
Validation loss = 0.002420312026515603
Validation loss = 0.002442197175696492
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002718430245295167
Validation loss = 0.0021817940287292004
Validation loss = 0.0026938433293253183
Validation loss = 0.002134881680831313
Validation loss = 0.0019126106053590775
Validation loss = 0.002453429624438286
Validation loss = 0.00215920340269804
Validation loss = 0.0019379457226023078
Validation loss = 0.00284825568087399
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002627781592309475
Validation loss = 0.0037630728911608458
Validation loss = 0.002206759760156274
Validation loss = 0.0031012678518891335
Validation loss = 0.002607205882668495
Validation loss = 0.00202286709100008
Validation loss = 0.0022034153807908297
Validation loss = 0.002361530205234885
Validation loss = 0.0034825617913156748
Validation loss = 0.0024007451720535755
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00264682131819427
Validation loss = 0.0021835772786289454
Validation loss = 0.002565785078331828
Validation loss = 0.0024730933364480734
Validation loss = 0.0022781796287745237
Validation loss = 0.002760886447504163
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00572 |
| Iteration     | 28       |
| MaximumReturn | -0.00419 |
| MinimumReturn | -0.00796 |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002365844091400504
Validation loss = 0.002179972128942609
Validation loss = 0.0017660238081589341
Validation loss = 0.0024122067261487246
Validation loss = 0.0025230785831809044
Validation loss = 0.0019463807111606002
Validation loss = 0.0019462539348751307
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001954984851181507
Validation loss = 0.0023211329244077206
Validation loss = 0.002687094733119011
Validation loss = 0.002292550401762128
Validation loss = 0.001964257564395666
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0030606104992330074
Validation loss = 0.0021542953327298164
Validation loss = 0.002492072992026806
Validation loss = 0.0037009301595389843
Validation loss = 0.0019130330765619874
Validation loss = 0.003288018051534891
Validation loss = 0.0023501955438405275
Validation loss = 0.0020900515373796225
Validation loss = 0.002576069440692663
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0024371319450438023
Validation loss = 0.0018411172786727548
Validation loss = 0.002071662340313196
Validation loss = 0.0019973807502537966
Validation loss = 0.0020213150419294834
Validation loss = 0.0019332291558384895
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002665892941877246
Validation loss = 0.0022539841011166573
Validation loss = 0.0019559308420866728
Validation loss = 0.00219176453538239
Validation loss = 0.0025987974368035793
Validation loss = 0.0021128221414983273
Validation loss = 0.0016963466769084334
Validation loss = 0.0019290721975266933
Validation loss = 0.0021500037983059883
Validation loss = 0.0023261355236172676
Validation loss = 0.001704693422652781
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0321  |
| Iteration     | 29       |
| MaximumReturn | -0.0201  |
| MinimumReturn | -0.0497  |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002874961821362376
Validation loss = 0.003159409388899803
Validation loss = 0.0021660211496055126
Validation loss = 0.002954157069325447
Validation loss = 0.002899925224483013
Validation loss = 0.002757551148533821
Validation loss = 0.0019934282172471285
Validation loss = 0.0017835822654888034
Validation loss = 0.0020333039574325085
Validation loss = 0.0024423866998404264
Validation loss = 0.0022021171171218157
Validation loss = 0.0019269336480647326
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00295963604003191
Validation loss = 0.0022350428625941277
Validation loss = 0.0020646981429308653
Validation loss = 0.0038307488430291414
Validation loss = 0.0021721322555094957
Validation loss = 0.002143648685887456
Validation loss = 0.0019299947889521718
Validation loss = 0.001992287114262581
Validation loss = 0.0022136252373456955
Validation loss = 0.0024445215240120888
Validation loss = 0.002174882683902979
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0023894510231912136
Validation loss = 0.002231932245194912
Validation loss = 0.002535320818424225
Validation loss = 0.0025803667958825827
Validation loss = 0.002065731445327401
Validation loss = 0.0023004866670817137
Validation loss = 0.0031052464619278908
Validation loss = 0.002364940010011196
Validation loss = 0.0020292848348617554
Validation loss = 0.0024272140581160784
Validation loss = 0.0038284955080598593
Validation loss = 0.002872633980587125
Validation loss = 0.0020197900012135506
Validation loss = 0.002774275606498122
Validation loss = 0.0027034077793359756
Validation loss = 0.002262611174955964
Validation loss = 0.003078688168898225
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003413628088310361
Validation loss = 0.0019438560120761395
Validation loss = 0.00213626679033041
Validation loss = 0.0020996311213821173
Validation loss = 0.0031910701654851437
Validation loss = 0.0023278617300093174
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023136064410209656
Validation loss = 0.0050229234620928764
Validation loss = 0.002477615140378475
Validation loss = 0.0017561034765094519
Validation loss = 0.001996431266888976
Validation loss = 0.0020080050453543663
Validation loss = 0.002561918692663312
Validation loss = 0.0024609568063169718
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00346 |
| Iteration     | 30       |
| MaximumReturn | -0.0024  |
| MinimumReturn | -0.00503 |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002090001944452524
Validation loss = 0.001688116928562522
Validation loss = 0.0030880619306117296
Validation loss = 0.0017126413295045495
Validation loss = 0.0017642838647589087
Validation loss = 0.0025230171158909798
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0026434825267642736
Validation loss = 0.0023118259850889444
Validation loss = 0.002279427833855152
Validation loss = 0.002018640050664544
Validation loss = 0.003309727180749178
Validation loss = 0.0023603299632668495
Validation loss = 0.0022688305471092463
Validation loss = 0.002432715380564332
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002759423339739442
Validation loss = 0.0031693903729319572
Validation loss = 0.0023460660595446825
Validation loss = 0.0021685976535081863
Validation loss = 0.002507530152797699
Validation loss = 0.0023626789916306734
Validation loss = 0.002322792075574398
Validation loss = 0.00194537581410259
Validation loss = 0.002263174392282963
Validation loss = 0.002100885845720768
Validation loss = 0.001736668637022376
Validation loss = 0.0017468614969402552
Validation loss = 0.0024640923365950584
Validation loss = 0.003707283176481724
Validation loss = 0.002114450791850686
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0023082210682332516
Validation loss = 0.002492292318493128
Validation loss = 0.0019170703599229455
Validation loss = 0.0035224210005253553
Validation loss = 0.0028601191006600857
Validation loss = 0.003692363854497671
Validation loss = 0.0021538434084504843
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0024257127661257982
Validation loss = 0.0020814971067011356
Validation loss = 0.001862509991042316
Validation loss = 0.0019068249966949224
Validation loss = 0.002486560260877013
Validation loss = 0.001982372719794512
Validation loss = 0.003712525125592947
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00562 |
| Iteration     | 31       |
| MaximumReturn | -0.00192 |
| MinimumReturn | -0.0136  |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020748646929860115
Validation loss = 0.002267351606860757
Validation loss = 0.0022188385482877493
Validation loss = 0.0024933484382927418
Validation loss = 0.0022185177076607943
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003046035999432206
Validation loss = 0.0026354026049375534
Validation loss = 0.0022195337805896997
Validation loss = 0.002929034875705838
Validation loss = 0.0033668610267341137
Validation loss = 0.001878680195659399
Validation loss = 0.0016606495482847095
Validation loss = 0.002193866763263941
Validation loss = 0.001837397925555706
Validation loss = 0.0030109945219010115
Validation loss = 0.002283005975186825
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002341658342629671
Validation loss = 0.0018719318322837353
Validation loss = 0.003822564845904708
Validation loss = 0.0022191533353179693
Validation loss = 0.002143569989129901
Validation loss = 0.002071487484499812
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021276799961924553
Validation loss = 0.002395005663856864
Validation loss = 0.0022368282079696655
Validation loss = 0.0019496860913932323
Validation loss = 0.002307591028511524
Validation loss = 0.002005938673391938
Validation loss = 0.0025909054093062878
Validation loss = 0.002362587722018361
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002633916912600398
Validation loss = 0.0022927301470190287
Validation loss = 0.0019514509476721287
Validation loss = 0.0019555259495973587
Validation loss = 0.002222241135314107
Validation loss = 0.0024194661527872086
Validation loss = 0.001794540206901729
Validation loss = 0.0037239042576402426
Validation loss = 0.0028672637417912483
Validation loss = 0.0027527795173227787
Validation loss = 0.0019528265111148357
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -111     |
| Iteration     | 32       |
| MaximumReturn | -46.8    |
| MinimumReturn | -139     |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0045173028483986855
Validation loss = 0.0019075535237789154
Validation loss = 0.0015970374224707484
Validation loss = 0.002059599617496133
Validation loss = 0.0016576051712036133
Validation loss = 0.0015701899537816644
Validation loss = 0.002450596773996949
Validation loss = 0.001818804768845439
Validation loss = 0.0017429111758247018
Validation loss = 0.0020442998502403498
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0035343242343515158
Validation loss = 0.0019997316412627697
Validation loss = 0.0021013279911130667
Validation loss = 0.0016730254283174872
Validation loss = 0.0022553522139787674
Validation loss = 0.0027143205516040325
Validation loss = 0.0018403589492663741
Validation loss = 0.0019219566602259874
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00484083266928792
Validation loss = 0.0019087271066382527
Validation loss = 0.0017542507266625762
Validation loss = 0.0020284350030124187
Validation loss = 0.0019255237421020865
Validation loss = 0.001788848894648254
Validation loss = 0.00165168393868953
Validation loss = 0.0025251545011997223
Validation loss = 0.002067261142656207
Validation loss = 0.001689120545051992
Validation loss = 0.0019230208126828074
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005952625535428524
Validation loss = 0.001875141984783113
Validation loss = 0.0016362917376682162
Validation loss = 0.0016423925990238786
Validation loss = 0.00200123549439013
Validation loss = 0.0018040118739008904
Validation loss = 0.0016588626895099878
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0038329882081598043
Validation loss = 0.002090678084641695
Validation loss = 0.0016991099109873176
Validation loss = 0.0018686645198613405
Validation loss = 0.0020745766814798117
Validation loss = 0.0018756082281470299
Validation loss = 0.002049637259915471
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0424  |
| Iteration     | 33       |
| MaximumReturn | -0.0271  |
| MinimumReturn | -0.0779  |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001822220510803163
Validation loss = 0.0018360180547460914
Validation loss = 0.00229076761752367
Validation loss = 0.0023251643870025873
Validation loss = 0.0018148897215723991
Validation loss = 0.0019025540677830577
Validation loss = 0.002342772902920842
Validation loss = 0.0016540922224521637
Validation loss = 0.0017307746456936002
Validation loss = 0.0021195649169385433
Validation loss = 0.002124771475791931
Validation loss = 0.001915686298161745
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0021444244775921106
Validation loss = 0.001833526766858995
Validation loss = 0.0021894166711717844
Validation loss = 0.0019067901885136962
Validation loss = 0.0016026662196964025
Validation loss = 0.0016779187135398388
Validation loss = 0.0019429922103881836
Validation loss = 0.002370028290897608
Validation loss = 0.002489565871655941
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017339960904791951
Validation loss = 0.0017754998989403248
Validation loss = 0.002011059084907174
Validation loss = 0.0016395506681874394
Validation loss = 0.001656504929997027
Validation loss = 0.0028347033075988293
Validation loss = 0.0018814855720847845
Validation loss = 0.0022183035034686327
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018187649548053741
Validation loss = 0.0017024502158164978
Validation loss = 0.0036397832445800304
Validation loss = 0.0018170657567679882
Validation loss = 0.002285783411934972
Validation loss = 0.001714595127850771
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001931411912664771
Validation loss = 0.002213060623034835
Validation loss = 0.0016015921719372272
Validation loss = 0.0019429398234933615
Validation loss = 0.0019255253719165921
Validation loss = 0.0015145567012950778
Validation loss = 0.0028057328891009092
Validation loss = 0.0018938570283353329
Validation loss = 0.0016592731699347496
Validation loss = 0.0023863567039370537
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.123   |
| Iteration     | 34       |
| MaximumReturn | -0.0846  |
| MinimumReturn | -0.164   |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001872015418484807
Validation loss = 0.0017334913136437535
Validation loss = 0.0015447711339220405
Validation loss = 0.001560252858325839
Validation loss = 0.0020066648721694946
Validation loss = 0.0014068592572584748
Validation loss = 0.0018079641740769148
Validation loss = 0.0018383996794000268
Validation loss = 0.0016877917805686593
Validation loss = 0.001840042183175683
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001459787366911769
Validation loss = 0.002189093502238393
Validation loss = 0.0023072613403201103
Validation loss = 0.001716170459985733
Validation loss = 0.0015019954880699515
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021605081856250763
Validation loss = 0.001351673505268991
Validation loss = 0.001858983770944178
Validation loss = 0.0017460363451391459
Validation loss = 0.001796050346456468
Validation loss = 0.0017142028082162142
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018015526002272964
Validation loss = 0.0037282451521605253
Validation loss = 0.0017616408877074718
Validation loss = 0.001996390987187624
Validation loss = 0.002165388548746705
Validation loss = 0.001497883815318346
Validation loss = 0.0020681535825133324
Validation loss = 0.0017446724232286215
Validation loss = 0.0015210702549666166
Validation loss = 0.0026919797528535128
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016913989093154669
Validation loss = 0.0018073490355163813
Validation loss = 0.001882778713479638
Validation loss = 0.0021713487803936005
Validation loss = 0.0021662116050720215
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0763  |
| Iteration     | 35       |
| MaximumReturn | -0.0438  |
| MinimumReturn | -0.121   |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020071619655936956
Validation loss = 0.002965274965390563
Validation loss = 0.0017403161618858576
Validation loss = 0.0021884168963879347
Validation loss = 0.001800615107640624
Validation loss = 0.0015909909270703793
Validation loss = 0.0020090241450816393
Validation loss = 0.0016817225841805339
Validation loss = 0.0016159681836143136
Validation loss = 0.0016990075819194317
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016166233690455556
Validation loss = 0.00199030339717865
Validation loss = 0.001752612879499793
Validation loss = 0.0020279937889426947
Validation loss = 0.001660119858570397
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0020744947250932455
Validation loss = 0.0017411368899047375
Validation loss = 0.0020206733606755733
Validation loss = 0.0016474102158099413
Validation loss = 0.0015702471137046814
Validation loss = 0.001477107172831893
Validation loss = 0.0019305977039039135
Validation loss = 0.0018749404698610306
Validation loss = 0.0015644788509234786
Validation loss = 0.0014529379550367594
Validation loss = 0.0015128905652090907
Validation loss = 0.002079238649457693
Validation loss = 0.001990371150895953
Validation loss = 0.0017415937036275864
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002199613256379962
Validation loss = 0.001540262601338327
Validation loss = 0.0018926358316093683
Validation loss = 0.0018687077099457383
Validation loss = 0.0022642859257757664
Validation loss = 0.0019098278135061264
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016879074973985553
Validation loss = 0.001522314385510981
Validation loss = 0.0016443678177893162
Validation loss = 0.0017189003992825747
Validation loss = 0.0020706707146018744
Validation loss = 0.0020016594789922237
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0184  |
| Iteration     | 36       |
| MaximumReturn | -0.0143  |
| MinimumReturn | -0.0241  |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018309776205569506
Validation loss = 0.0022445942740887403
Validation loss = 0.0022223428823053837
Validation loss = 0.0022556891199201345
Validation loss = 0.0014444653643295169
Validation loss = 0.0017778081819415092
Validation loss = 0.0015501959715038538
Validation loss = 0.0014654886908829212
Validation loss = 0.00161988683976233
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001982791582122445
Validation loss = 0.0019791016820818186
Validation loss = 0.0017405524849891663
Validation loss = 0.0016427428927272558
Validation loss = 0.0017451897729188204
Validation loss = 0.0020449028816074133
Validation loss = 0.0015147338854148984
Validation loss = 0.0020231823436915874
Validation loss = 0.0019792045932263136
Validation loss = 0.0016702704597264528
Validation loss = 0.001960688503459096
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002023678505793214
Validation loss = 0.0018642049981281161
Validation loss = 0.001588928047567606
Validation loss = 0.0015844993758946657
Validation loss = 0.0017706842627376318
Validation loss = 0.002046275418251753
Validation loss = 0.0019011638360098004
Validation loss = 0.0016125148395076394
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015669773565605283
Validation loss = 0.0017783031798899174
Validation loss = 0.001734779216349125
Validation loss = 0.0019071693532168865
Validation loss = 0.0018750381423160434
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002324862638488412
Validation loss = 0.001845994614996016
Validation loss = 0.001778933103196323
Validation loss = 0.001773486495949328
Validation loss = 0.0015221552457660437
Validation loss = 0.0015016247052699327
Validation loss = 0.002345788525417447
Validation loss = 0.0016620054375380278
Validation loss = 0.001709053642116487
Validation loss = 0.0015801496338099241
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00213  |
| Iteration     | 37        |
| MaximumReturn | -0.000811 |
| MinimumReturn | -0.00469  |
| TotalSamples  | 64974     |
-----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002079553669318557
Validation loss = 0.0016405616188421845
Validation loss = 0.0015921226004138589
Validation loss = 0.0017745585646480322
Validation loss = 0.001821181969717145
Validation loss = 0.001837258692830801
Validation loss = 0.0016839171294122934
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016603234689682722
Validation loss = 0.001451017800718546
Validation loss = 0.0018384057329967618
Validation loss = 0.0018465286120772362
Validation loss = 0.0013172323815524578
Validation loss = 0.001595568610355258
Validation loss = 0.00177107029594481
Validation loss = 0.0018665544921532273
Validation loss = 0.0018680550856515765
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015846764435991645
Validation loss = 0.0018205706728622317
Validation loss = 0.0016464516520500183
Validation loss = 0.0018094051629304886
Validation loss = 0.0017381309298798442
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017706577200442553
Validation loss = 0.0014480974059551954
Validation loss = 0.0017578563420102
Validation loss = 0.00280372379347682
Validation loss = 0.001771837705746293
Validation loss = 0.0017181581351906061
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001874403329566121
Validation loss = 0.0017829282442107797
Validation loss = 0.0020492817275226116
Validation loss = 0.0016374972183257341
Validation loss = 0.0018136873841285706
Validation loss = 0.00241919350810349
Validation loss = 0.0015735940542072058
Validation loss = 0.001504251966252923
Validation loss = 0.0015630677808076143
Validation loss = 0.0019053707364946604
Validation loss = 0.00150502217002213
Validation loss = 0.0013836573343724012
Validation loss = 0.0020388497505337
Validation loss = 0.001639856956899166
Validation loss = 0.001420854590833187
Validation loss = 0.0016060618218034506
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0363  |
| Iteration     | 38       |
| MaximumReturn | -0.0205  |
| MinimumReturn | -0.0452  |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016283000586554408
Validation loss = 0.0016943160444498062
Validation loss = 0.0013654734939336777
Validation loss = 0.002422105986624956
Validation loss = 0.0018852289067581296
Validation loss = 0.0013959070201963186
Validation loss = 0.0018548475345596671
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0022952533327043056
Validation loss = 0.0014357876498252153
Validation loss = 0.0018399450927972794
Validation loss = 0.001956914784386754
Validation loss = 0.0020634231623262167
Validation loss = 0.0016868397360667586
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016488400287926197
Validation loss = 0.0016676582163199782
Validation loss = 0.001673012739047408
Validation loss = 0.0013803557958453894
Validation loss = 0.0017997646937146783
Validation loss = 0.0014389827847480774
Validation loss = 0.0014777411706745625
Validation loss = 0.0034579343628138304
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021259989589452744
Validation loss = 0.0014575872337445617
Validation loss = 0.00210739322938025
Validation loss = 0.0013447724049910903
Validation loss = 0.0015730377053841949
Validation loss = 0.002023720880970359
Validation loss = 0.0016235025832429528
Validation loss = 0.001548452884890139
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017043795669451356
Validation loss = 0.0014740955084562302
Validation loss = 0.0018022521398961544
Validation loss = 0.001535364193841815
Validation loss = 0.001825057785026729
Validation loss = 0.001550733926706016
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00199  |
| Iteration     | 39        |
| MaximumReturn | -0.000778 |
| MinimumReturn | -0.00615  |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001592288026586175
Validation loss = 0.0021698633208870888
Validation loss = 0.0014763758517801762
Validation loss = 0.0016476551536470652
Validation loss = 0.0020876547787338495
Validation loss = 0.0016802619211375713
Validation loss = 0.0015029615024104714
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016907801618799567
Validation loss = 0.002248628530651331
Validation loss = 0.0014159661950543523
Validation loss = 0.0016862498596310616
Validation loss = 0.0014550319174304605
Validation loss = 0.0016658235108479857
Validation loss = 0.0013610997702926397
Validation loss = 0.001780772116035223
Validation loss = 0.0030517885461449623
Validation loss = 0.002016949700191617
Validation loss = 0.0014556718524545431
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019126157276332378
Validation loss = 0.001232236623764038
Validation loss = 0.0016720236744731665
Validation loss = 0.00158506422303617
Validation loss = 0.0020432439632713795
Validation loss = 0.0014484511921182275
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021698903292417526
Validation loss = 0.0017598485574126244
Validation loss = 0.0014739567413926125
Validation loss = 0.0018201700877398252
Validation loss = 0.0016109314747154713
Validation loss = 0.0018385385628789663
Validation loss = 0.0017929767491295934
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014411541633307934
Validation loss = 0.00133791146799922
Validation loss = 0.001641934853978455
Validation loss = 0.0014374656602740288
Validation loss = 0.0013374360278248787
Validation loss = 0.0014128395123407245
Validation loss = 0.0017008483409881592
Validation loss = 0.0018967988435178995
Validation loss = 0.0017792817670851946
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00229  |
| Iteration     | 40        |
| MaximumReturn | -0.000695 |
| MinimumReturn | -0.0069   |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014715684810653329
Validation loss = 0.0016166812274605036
Validation loss = 0.0016198186203837395
Validation loss = 0.0014634537510573864
Validation loss = 0.0015479096909984946
Validation loss = 0.00137827277649194
Validation loss = 0.0015544319758191705
Validation loss = 0.0013402655022218823
Validation loss = 0.0018126890063285828
Validation loss = 0.00159369595348835
Validation loss = 0.0013594543561339378
Validation loss = 0.0018298581708222628
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014868979342281818
Validation loss = 0.0019664978608489037
Validation loss = 0.0018115960992872715
Validation loss = 0.001379936351440847
Validation loss = 0.0015891738003119826
Validation loss = 0.001356324995867908
Validation loss = 0.0015530487289652228
Validation loss = 0.0015123153571039438
Validation loss = 0.001542479731142521
Validation loss = 0.002142007229849696
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001631597988307476
Validation loss = 0.0018581422045826912
Validation loss = 0.0016568894498050213
Validation loss = 0.0014768795808777213
Validation loss = 0.002022252418100834
Validation loss = 0.0015087826177477837
Validation loss = 0.0014869972364977002
Validation loss = 0.0018813926726579666
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00184194918256253
Validation loss = 0.001575420843437314
Validation loss = 0.0026732124388217926
Validation loss = 0.0021838934626430273
Validation loss = 0.0020645721815526485
Validation loss = 0.0035312504041939974
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0021179369650781155
Validation loss = 0.0016320435097441077
Validation loss = 0.0014653304824605584
Validation loss = 0.001661852584220469
Validation loss = 0.0015001405263319612
Validation loss = 0.0014300671173259616
Validation loss = 0.0016006933292374015
Validation loss = 0.001502316095866263
Validation loss = 0.0014582041185349226
Validation loss = 0.0014246709179133177
Validation loss = 0.0018720547668635845
Validation loss = 0.0014785018283873796
Validation loss = 0.001594013418070972
Validation loss = 0.0017495854990556836
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.152   |
| Iteration     | 41       |
| MaximumReturn | -0.0885  |
| MinimumReturn | -0.809   |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018792171031236649
Validation loss = 0.0014228255022317171
Validation loss = 0.0019681330304592848
Validation loss = 0.0018864048179239035
Validation loss = 0.0015664364909753203
Validation loss = 0.0019276879029348493
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018915804103016853
Validation loss = 0.0014715769793838263
Validation loss = 0.0021120826713740826
Validation loss = 0.0013061740901321173
Validation loss = 0.0015382809797301888
Validation loss = 0.0013728728517889977
Validation loss = 0.0018915780819952488
Validation loss = 0.0017723923083394766
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0035100930836051702
Validation loss = 0.0014354682061821222
Validation loss = 0.00130129000172019
Validation loss = 0.0013879203470423818
Validation loss = 0.002219414571300149
Validation loss = 0.0014103974681347609
Validation loss = 0.001529045752249658
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001611847779713571
Validation loss = 0.0023542337585240602
Validation loss = 0.0016966250259429216
Validation loss = 0.0017370744608342648
Validation loss = 0.0015993865672498941
Validation loss = 0.0015972304390743375
Validation loss = 0.0014255743008106947
Validation loss = 0.0017411670414730906
Validation loss = 0.0014484511921182275
Validation loss = 0.002338794060051441
Validation loss = 0.0016909732948988676
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001813054084777832
Validation loss = 0.001928798039443791
Validation loss = 0.001543209538795054
Validation loss = 0.0015108482912182808
Validation loss = 0.001210212823934853
Validation loss = 0.0014465437270700932
Validation loss = 0.0014986218884587288
Validation loss = 0.0018481974257156253
Validation loss = 0.0018610761035233736
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5       |
| Iteration     | 42       |
| MaximumReturn | -0.0525  |
| MinimumReturn | -14.5    |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025479046162217855
Validation loss = 0.0015267902053892612
Validation loss = 0.0020030164159834385
Validation loss = 0.0014515111688524485
Validation loss = 0.001797989010810852
Validation loss = 0.0013411777326837182
Validation loss = 0.0014729173853993416
Validation loss = 0.0014498966047540307
Validation loss = 0.0015728245489299297
Validation loss = 0.0013549909926950932
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016460106708109379
Validation loss = 0.0013916586758568883
Validation loss = 0.0015592968557029963
Validation loss = 0.0013737629633396864
Validation loss = 0.00145842251367867
Validation loss = 0.0017206260235980153
Validation loss = 0.0012346074217930436
Validation loss = 0.0021683264058083296
Validation loss = 0.0014559875708073378
Validation loss = 0.0013197791995480657
Validation loss = 0.0013915145536884665
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002007921226322651
Validation loss = 0.0015838223043829203
Validation loss = 0.0013483093352988362
Validation loss = 0.0017668016953393817
Validation loss = 0.001451975549571216
Validation loss = 0.0013056517345830798
Validation loss = 0.0013253135839477181
Validation loss = 0.0019472622079774737
Validation loss = 0.0012000378919765353
Validation loss = 0.002033120719715953
Validation loss = 0.0012322801630944014
Validation loss = 0.0016887481324374676
Validation loss = 0.0014952002093195915
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00179587013553828
Validation loss = 0.0013636343646794558
Validation loss = 0.0014814386377111077
Validation loss = 0.0017495796782895923
Validation loss = 0.0014513516798615456
Validation loss = 0.0018212991999462247
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002799033187329769
Validation loss = 0.001348803285509348
Validation loss = 0.0011891773901879787
Validation loss = 0.0015959638403728604
Validation loss = 0.0013055240269750357
Validation loss = 0.001951369340531528
Validation loss = 0.0018077364657074213
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00861  |
| Iteration     | 43        |
| MaximumReturn | -0.000827 |
| MinimumReturn | -0.0389   |
| TotalSamples  | 74970     |
-----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016132788732647896
Validation loss = 0.001485935179516673
Validation loss = 0.0015882888110354543
Validation loss = 0.0014701861655339599
Validation loss = 0.0022238832898437977
Validation loss = 0.0014953550416976213
Validation loss = 0.0016878409078344703
Validation loss = 0.0016387953655794263
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015666339313611388
Validation loss = 0.0015913140960037708
Validation loss = 0.0016198456287384033
Validation loss = 0.0015908663626760244
Validation loss = 0.0015292231692001224
Validation loss = 0.0024117932189255953
Validation loss = 0.0013709140475839376
Validation loss = 0.0018946188502013683
Validation loss = 0.002007146831601858
Validation loss = 0.001298714429140091
Validation loss = 0.0015821903944015503
Validation loss = 0.0019302295986562967
Validation loss = 0.0013099991483613849
Validation loss = 0.0018388802418485284
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004191461019217968
Validation loss = 0.00150436838157475
Validation loss = 0.0014672776451334357
Validation loss = 0.0015161711489781737
Validation loss = 0.0014632277889177203
Validation loss = 0.001452482072636485
Validation loss = 0.001411830191500485
Validation loss = 0.0012824536534026265
Validation loss = 0.0017969318432733417
Validation loss = 0.0016803611069917679
Validation loss = 0.0012545044301077724
Validation loss = 0.0014129278715699911
Validation loss = 0.0016130106523633003
Validation loss = 0.0014797330368310213
Validation loss = 0.0019605117850005627
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002022756263613701
Validation loss = 0.00205804081633687
Validation loss = 0.0020727221854031086
Validation loss = 0.0016467907698825002
Validation loss = 0.0017055633943527937
Validation loss = 0.002039843937382102
Validation loss = 0.001799225457943976
Validation loss = 0.001751808449625969
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015895895194262266
Validation loss = 0.0013382717734202743
Validation loss = 0.0014711610274389386
Validation loss = 0.0015457469271495938
Validation loss = 0.0016668582102283835
Validation loss = 0.0013390202075242996
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00459  |
| Iteration     | 44        |
| MaximumReturn | -0.000739 |
| MinimumReturn | -0.0321   |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001759584411047399
Validation loss = 0.0016965476097539067
Validation loss = 0.0015389984473586082
Validation loss = 0.0014745085500180721
Validation loss = 0.001501659513451159
Validation loss = 0.0016435730503872037
Validation loss = 0.0015263577224686742
Validation loss = 0.001690755132585764
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0022152338642627
Validation loss = 0.0015565637731924653
Validation loss = 0.0013723662123084068
Validation loss = 0.001585484715178609
Validation loss = 0.0015269032446667552
Validation loss = 0.0017475960776209831
Validation loss = 0.0020790882408618927
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018045813776552677
Validation loss = 0.001564226346090436
Validation loss = 0.0016963121015578508
Validation loss = 0.0014587101759389043
Validation loss = 0.0014705404173582792
Validation loss = 0.0015118191950023174
Validation loss = 0.0014989249175414443
Validation loss = 0.001439721672795713
Validation loss = 0.0015634947922080755
Validation loss = 0.0014431547606363893
Validation loss = 0.00276338797993958
Validation loss = 0.0015053612878546119
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014935972867533565
Validation loss = 0.0017538904212415218
Validation loss = 0.0020415198523551226
Validation loss = 0.0020992597565054893
Validation loss = 0.002111369278281927
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016095226164907217
Validation loss = 0.0016643301350995898
Validation loss = 0.0019655965734273195
Validation loss = 0.0015895470278337598
Validation loss = 0.0014391060685738921
Validation loss = 0.0016233725473284721
Validation loss = 0.001412428799085319
Validation loss = 0.0015194305451586843
Validation loss = 0.0015650754794478416
Validation loss = 0.0014434970216825604
Validation loss = 0.0017498984234407544
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.154   |
| Iteration     | 45       |
| MaximumReturn | -0.0721  |
| MinimumReturn | -0.247   |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014929283643141389
Validation loss = 0.0013708570040762424
Validation loss = 0.001281545963138342
Validation loss = 0.001781302154995501
Validation loss = 0.0014547458849847317
Validation loss = 0.0012155858566984534
Validation loss = 0.0016220221295952797
Validation loss = 0.001295659807510674
Validation loss = 0.0013466252712532878
Validation loss = 0.0018154216231778264
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017285990761592984
Validation loss = 0.0011500444961711764
Validation loss = 0.001448598806746304
Validation loss = 0.0015432126820087433
Validation loss = 0.0013519198400899768
Validation loss = 0.0022515917662531137
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016595018096268177
Validation loss = 0.001222915481775999
Validation loss = 0.0017776464810594916
Validation loss = 0.0012966170907020569
Validation loss = 0.0013415691209957004
Validation loss = 0.0016318586422130466
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015991667751222849
Validation loss = 0.0015701891388744116
Validation loss = 0.0016082662623375654
Validation loss = 0.0016014694701880217
Validation loss = 0.0015538384905084968
Validation loss = 0.0019546260591596365
Validation loss = 0.0016721418360248208
Validation loss = 0.0018629995174705982
Validation loss = 0.0017204382456839085
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018704694230109453
Validation loss = 0.0015030117938295007
Validation loss = 0.0013961868826299906
Validation loss = 0.0014610810903832316
Validation loss = 0.0015904039610177279
Validation loss = 0.0016725590685382485
Validation loss = 0.0017813178710639477
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00592  |
| Iteration     | 46        |
| MaximumReturn | -0.000731 |
| MinimumReturn | -0.0309   |
| TotalSamples  | 79968     |
-----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016338251298293471
Validation loss = 0.0015286959242075682
Validation loss = 0.001828948617912829
Validation loss = 0.0012743874685838819
Validation loss = 0.0026158508844673634
Validation loss = 0.0016162734245881438
Validation loss = 0.0015864474698901176
Validation loss = 0.0017891194438561797
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0021304446272552013
Validation loss = 0.001303944387473166
Validation loss = 0.0019827070645987988
Validation loss = 0.0018647171091288328
Validation loss = 0.0013689998304471374
Validation loss = 0.0015359894605353475
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014636778505519032
Validation loss = 0.0017761837225407362
Validation loss = 0.0013682482531294227
Validation loss = 0.0028346183244138956
Validation loss = 0.0017024639528244734
Validation loss = 0.0014419672079384327
Validation loss = 0.001319192349910736
Validation loss = 0.0015466068871319294
Validation loss = 0.0012976363068446517
Validation loss = 0.0012580257607623935
Validation loss = 0.0017470685997977853
Validation loss = 0.0012914806138724089
Validation loss = 0.001342683332040906
Validation loss = 0.0013847058871760964
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015419094124808908
Validation loss = 0.0018928542267531157
Validation loss = 0.0018074766267091036
Validation loss = 0.0016791804227977991
Validation loss = 0.0014247797662392259
Validation loss = 0.0013594661140814424
Validation loss = 0.0012771831825375557
Validation loss = 0.0015992037951946259
Validation loss = 0.0019255295628681779
Validation loss = 0.0015646182000637054
Validation loss = 0.0019938943441957235
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014616688713431358
Validation loss = 0.001400793669745326
Validation loss = 0.0013322295853868127
Validation loss = 0.0017832223093137145
Validation loss = 0.0018993910634890199
Validation loss = 0.0012622199719771743
Validation loss = 0.0016740914434194565
Validation loss = 0.001981165958568454
Validation loss = 0.001817038282752037
Validation loss = 0.0013709604972973466
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00635  |
| Iteration     | 47        |
| MaximumReturn | -0.000571 |
| MinimumReturn | -0.0321   |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001270378241315484
Validation loss = 0.001531184301711619
Validation loss = 0.0017131395870819688
Validation loss = 0.0013557751663029194
Validation loss = 0.001419390318915248
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017535996157675982
Validation loss = 0.0013675931841135025
Validation loss = 0.001583784120157361
Validation loss = 0.0020313092973083258
Validation loss = 0.001572854584082961
Validation loss = 0.0015807304298505187
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001393183832988143
Validation loss = 0.0019993369933217764
Validation loss = 0.001390250283293426
Validation loss = 0.0014142993604764342
Validation loss = 0.0014557719696313143
Validation loss = 0.001498698489740491
Validation loss = 0.0017196066910400987
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019039486069232225
Validation loss = 0.0017509147292003036
Validation loss = 0.0016386050265282393
Validation loss = 0.0022185607813298702
Validation loss = 0.001522031263448298
Validation loss = 0.001415345584973693
Validation loss = 0.001553625101223588
Validation loss = 0.0018839776748791337
Validation loss = 0.002636157674714923
Validation loss = 0.0018990181852132082
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00159535463899374
Validation loss = 0.001376396743580699
Validation loss = 0.0017552723875269294
Validation loss = 0.0016585439443588257
Validation loss = 0.0015215082094073296
Validation loss = 0.0018370978068560362
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00538  |
| Iteration     | 48        |
| MaximumReturn | -0.000636 |
| MinimumReturn | -0.0321   |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015230283606797457
Validation loss = 0.0012923594331368804
Validation loss = 0.0017033611657097936
Validation loss = 0.0013186945579946041
Validation loss = 0.001507688662968576
Validation loss = 0.0017831888981163502
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014483463019132614
Validation loss = 0.0012936005368828773
Validation loss = 0.0016378717264160514
Validation loss = 0.0014064814895391464
Validation loss = 0.0014647202333435416
Validation loss = 0.001421978697180748
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016878029564395547
Validation loss = 0.001718140090815723
Validation loss = 0.0013998798094689846
Validation loss = 0.0014874155167490244
Validation loss = 0.001224387320689857
Validation loss = 0.001449202187359333
Validation loss = 0.0015471557853743434
Validation loss = 0.0013680370757356286
Validation loss = 0.0013289356138557196
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019130693981423974
Validation loss = 0.0019280444830656052
Validation loss = 0.0017756709130480886
Validation loss = 0.0016537824412807822
Validation loss = 0.002228420227766037
Validation loss = 0.002100041601806879
Validation loss = 0.0014606320764869452
Validation loss = 0.0024729387369006872
Validation loss = 0.0013458793982863426
Validation loss = 0.0013009875547140837
Validation loss = 0.0014190110377967358
Validation loss = 0.002215777290984988
Validation loss = 0.0017136555397883058
Validation loss = 0.0015004584565758705
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001699752057902515
Validation loss = 0.0015822468558326364
Validation loss = 0.0016539011849090457
Validation loss = 0.00138176791369915
Validation loss = 0.0014965870650485158
Validation loss = 0.0018800273537635803
Validation loss = 0.0022956591565161943
Validation loss = 0.0017585732275620103
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.79    |
| Iteration     | 49       |
| MaximumReturn | -0.119   |
| MinimumReturn | -65      |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00811378937214613
Validation loss = 0.001706729643046856
Validation loss = 0.0012382271233946085
Validation loss = 0.0013924776576459408
Validation loss = 0.001180583261884749
Validation loss = 0.0028541747014969587
Validation loss = 0.0013504999224096537
Validation loss = 0.0014714084099978209
Validation loss = 0.001398986903950572
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002638978883624077
Validation loss = 0.0035989508032798767
Validation loss = 0.001618705689907074
Validation loss = 0.0011814850149676204
Validation loss = 0.0012496223207563162
Validation loss = 0.0012882439186796546
Validation loss = 0.001251984853297472
Validation loss = 0.001969918143004179
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006064657121896744
Validation loss = 0.001418449217453599
Validation loss = 0.00237449211999774
Validation loss = 0.0014433077303692698
Validation loss = 0.001125425798818469
Validation loss = 0.0013737453846260905
Validation loss = 0.0013616519281640649
Validation loss = 0.001214659190736711
Validation loss = 0.0013589480658993125
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006709254812449217
Validation loss = 0.0037678400985896587
Validation loss = 0.0016629239544272423
Validation loss = 0.0013604944106191397
Validation loss = 0.001309874001890421
Validation loss = 0.0012148668756708503
Validation loss = 0.0011934423819184303
Validation loss = 0.0016842992044985294
Validation loss = 0.0013398166047409177
Validation loss = 0.0015278925420716405
Validation loss = 0.0014862676616758108
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0054867928847670555
Validation loss = 0.0015906194457784295
Validation loss = 0.002160403411835432
Validation loss = 0.0014739235630258918
Validation loss = 0.0012161616468802094
Validation loss = 0.0012533861445263028
Validation loss = 0.0012129682581871748
Validation loss = 0.0017493232153356075
Validation loss = 0.0013608094304800034
Validation loss = 0.0013521929504349828
Validation loss = 0.0011236678110435605
Validation loss = 0.0012716283090412617
Validation loss = 0.0013811258831992745
Validation loss = 0.0013507949188351631
Validation loss = 0.001096326275728643
Validation loss = 0.0011223392793908715
Validation loss = 0.0012642216170206666
Validation loss = 0.0012635620078071952
Validation loss = 0.0010342331370338798
Validation loss = 0.0019103559898212552
Validation loss = 0.001091022277250886
Validation loss = 0.001343272509984672
Validation loss = 0.0012811948545277119
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.8    |
| Iteration     | 50       |
| MaximumReturn | -0.141   |
| MinimumReturn | -44.5    |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001620481489226222
Validation loss = 0.003027580212801695
Validation loss = 0.0015252501470968127
Validation loss = 0.0013164221309125423
Validation loss = 0.0013506630202755332
Validation loss = 0.0015932910609990358
Validation loss = 0.0013947362313047051
Validation loss = 0.0016662144334986806
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013776251580566168
Validation loss = 0.0022479777690023184
Validation loss = 0.0019359600264579058
Validation loss = 0.001286861370317638
Validation loss = 0.0012049501528963447
Validation loss = 0.0012486767955124378
Validation loss = 0.0016516032628715038
Validation loss = 0.0017584806773811579
Validation loss = 0.0011384242679923773
Validation loss = 0.0021762920077890158
Validation loss = 0.0013622273690998554
Validation loss = 0.0012328856391832232
Validation loss = 0.0012833281653001904
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014349017292261124
Validation loss = 0.0013330125948414207
Validation loss = 0.0015301178209483624
Validation loss = 0.0014993584481999278
Validation loss = 0.0010866114171221852
Validation loss = 0.0013978139031678438
Validation loss = 0.0018903381424024701
Validation loss = 0.001393470331095159
Validation loss = 0.0012278727954253554
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013759611174464226
Validation loss = 0.0010998412035405636
Validation loss = 0.0011758193140849471
Validation loss = 0.001236599637195468
Validation loss = 0.00143907661549747
Validation loss = 0.001788343652151525
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015174731379374862
Validation loss = 0.0011348630068823695
Validation loss = 0.001186066190712154
Validation loss = 0.001278234296478331
Validation loss = 0.0011904274579137564
Validation loss = 0.0011991806095466018
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.8    |
| Iteration     | 51       |
| MaximumReturn | -1.4     |
| MinimumReturn | -111     |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025505495723336935
Validation loss = 0.0019946603570133448
Validation loss = 0.002024252898991108
Validation loss = 0.002174246357753873
Validation loss = 0.002180787269026041
Validation loss = 0.0018270659493282437
Validation loss = 0.0018285902915522456
Validation loss = 0.0024601644836366177
Validation loss = 0.0022399420849978924
Validation loss = 0.0018399771070107818
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0028380623552948236
Validation loss = 0.001713006990030408
Validation loss = 0.001635190797969699
Validation loss = 0.0018769282614812255
Validation loss = 0.0017094019567593932
Validation loss = 0.0018533379770815372
Validation loss = 0.0015180525369942188
Validation loss = 0.0015759189845994115
Validation loss = 0.00373863079585135
Validation loss = 0.00163044105283916
Validation loss = 0.0016886686207726598
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018177489982917905
Validation loss = 0.001702563022263348
Validation loss = 0.0020703880582004786
Validation loss = 0.001524213352240622
Validation loss = 0.0015432994114235044
Validation loss = 0.0016765870386734605
Validation loss = 0.001569831627421081
Validation loss = 0.002199792768806219
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0022982805967330933
Validation loss = 0.0016680876724421978
Validation loss = 0.001898365095257759
Validation loss = 0.0016166690038517118
Validation loss = 0.0016582447569817305
Validation loss = 0.001679210108704865
Validation loss = 0.0019473303109407425
Validation loss = 0.0018273249734193087
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002065818989649415
Validation loss = 0.001694860402494669
Validation loss = 0.0016056325985118747
Validation loss = 0.0015602405183017254
Validation loss = 0.0017995401285588741
Validation loss = 0.002115398645401001
Validation loss = 0.0015226321993395686
Validation loss = 0.0017760450718924403
Validation loss = 0.0014643005561083555
Validation loss = 0.0014245450729504228
Validation loss = 0.0017687603831291199
Validation loss = 0.0015694239409640431
Validation loss = 0.0015170699916779995
Validation loss = 0.0017100219847634435
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -17.3    |
| Iteration     | 52       |
| MaximumReturn | -0.225   |
| MinimumReturn | -85.9    |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001948029501363635
Validation loss = 0.001713540288619697
Validation loss = 0.0020342881325632334
Validation loss = 0.0018301574746146798
Validation loss = 0.002015761099755764
Validation loss = 0.0014956494560465217
Validation loss = 0.001746299909427762
Validation loss = 0.00201298831962049
Validation loss = 0.0015580833423882723
Validation loss = 0.0018838009564206004
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017820517532527447
Validation loss = 0.0016878177411854267
Validation loss = 0.0014299624599516392
Validation loss = 0.0014601252041757107
Validation loss = 0.0016799158183857799
Validation loss = 0.0020425834227353334
Validation loss = 0.0015923506580293179
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019937034230679274
Validation loss = 0.0014208722859621048
Validation loss = 0.0014258967712521553
Validation loss = 0.0015867104521021247
Validation loss = 0.0013982064556330442
Validation loss = 0.0013591740280389786
Validation loss = 0.0017527397722005844
Validation loss = 0.0015130809042602777
Validation loss = 0.0016731687355786562
Validation loss = 0.0015535871498286724
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017611273797228932
Validation loss = 0.0015346698928624392
Validation loss = 0.0015073748072609305
Validation loss = 0.0015030259964987636
Validation loss = 0.0017374773742631078
Validation loss = 0.0018375712679699063
Validation loss = 0.0016000497853383422
Validation loss = 0.002102116821333766
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002120687160640955
Validation loss = 0.0019755440298467875
Validation loss = 0.001491110073402524
Validation loss = 0.001530482666566968
Validation loss = 0.0015677496558055282
Validation loss = 0.001507908571511507
Validation loss = 0.0018298213835805655
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0663   |
| Iteration     | 53        |
| MaximumReturn | -0.000716 |
| MinimumReturn | -0.536    |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002440451178699732
Validation loss = 0.0015544210327789187
Validation loss = 0.0016390113160014153
Validation loss = 0.0016958886990323663
Validation loss = 0.001513614202849567
Validation loss = 0.0013807504437863827
Validation loss = 0.0016997918719425797
Validation loss = 0.0015802144771441817
Validation loss = 0.0016836184076964855
Validation loss = 0.0017498091328889132
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001548832282423973
Validation loss = 0.0014033012557774782
Validation loss = 0.0014156099641695619
Validation loss = 0.0014108979376032948
Validation loss = 0.0014667778741568327
Validation loss = 0.0014372001169249415
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021207351237535477
Validation loss = 0.0015700907679274678
Validation loss = 0.0014353215228766203
Validation loss = 0.0015621983911842108
Validation loss = 0.0015485620824620128
Validation loss = 0.0014181642327457666
Validation loss = 0.0014275829307734966
Validation loss = 0.0014613874955102801
Validation loss = 0.0014897891087457538
Validation loss = 0.002687242114916444
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017911106115207076
Validation loss = 0.0016475205775350332
Validation loss = 0.00166360754519701
Validation loss = 0.0017584400484338403
Validation loss = 0.0014995759120211005
Validation loss = 0.0015300717204809189
Validation loss = 0.0022519021295011044
Validation loss = 0.0016143452376127243
Validation loss = 0.0015389765612781048
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016634053317829967
Validation loss = 0.001471419120207429
Validation loss = 0.0014412669697776437
Validation loss = 0.001643787370994687
Validation loss = 0.0018749746959656477
Validation loss = 0.0015181823400780559
Validation loss = 0.0014634953113272786
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0197   |
| Iteration     | 54        |
| MaximumReturn | -0.000645 |
| MinimumReturn | -0.262    |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018685254035517573
Validation loss = 0.0025158855132758617
Validation loss = 0.0014947628369554877
Validation loss = 0.001365048112347722
Validation loss = 0.0014257332077249885
Validation loss = 0.0017121770652011037
Validation loss = 0.0020142346620559692
Validation loss = 0.0017100966069847345
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015349689638242126
Validation loss = 0.0016425972571596503
Validation loss = 0.0016611396567896008
Validation loss = 0.0016739324200898409
Validation loss = 0.0018000358249992132
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014181574806571007
Validation loss = 0.0013723417650908232
Validation loss = 0.0012974048731848598
Validation loss = 0.0013140748487785459
Validation loss = 0.0015050728106871247
Validation loss = 0.0014064527349546552
Validation loss = 0.001277466886676848
Validation loss = 0.0017506418516859412
Validation loss = 0.0012017682893201709
Validation loss = 0.0016046332893893123
Validation loss = 0.0013682548888027668
Validation loss = 0.0012066997587680817
Validation loss = 0.0012950404779985547
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017911717295646667
Validation loss = 0.0017647233325988054
Validation loss = 0.0016520112985745072
Validation loss = 0.0014624104369431734
Validation loss = 0.001405737129971385
Validation loss = 0.00485837971791625
Validation loss = 0.0014513905625790358
Validation loss = 0.0015286580892279744
Validation loss = 0.0016083548543974757
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002152705565094948
Validation loss = 0.0014874831540510058
Validation loss = 0.0014466055436059833
Validation loss = 0.001512174028903246
Validation loss = 0.0012892363592982292
Validation loss = 0.0015199609333649278
Validation loss = 0.001694243517704308
Validation loss = 0.0013591269962489605
Validation loss = 0.001353487023152411
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00727  |
| Iteration     | 55        |
| MaximumReturn | -0.000642 |
| MinimumReturn | -0.16     |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016624822746962309
Validation loss = 0.0014349783305078745
Validation loss = 0.001601376454345882
Validation loss = 0.0013648815220221877
Validation loss = 0.0016595334745943546
Validation loss = 0.0014440759550780058
Validation loss = 0.0017381105571985245
Validation loss = 0.001482382882386446
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016516150208190084
Validation loss = 0.001342643634416163
Validation loss = 0.0025541929062455893
Validation loss = 0.0014062164118513465
Validation loss = 0.0016732508083805442
Validation loss = 0.001627513556741178
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012859517009928823
Validation loss = 0.0012408752227202058
Validation loss = 0.001317306188866496
Validation loss = 0.0014433434698730707
Validation loss = 0.0017945714062079787
Validation loss = 0.001353539526462555
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001626938465051353
Validation loss = 0.0019177041249349713
Validation loss = 0.0013929040869697928
Validation loss = 0.0017497837543487549
Validation loss = 0.0015010097995400429
Validation loss = 0.0016202088445425034
Validation loss = 0.001651496160775423
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015785462455824018
Validation loss = 0.0017621114384382963
Validation loss = 0.001878800569102168
Validation loss = 0.001364573952741921
Validation loss = 0.0014731138944625854
Validation loss = 0.001532954629510641
Validation loss = 0.0014905843418091536
Validation loss = 0.0014655085979029536
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.104    |
| Iteration     | 56        |
| MaximumReturn | -0.000679 |
| MinimumReturn | -0.531    |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001500010141171515
Validation loss = 0.001262394362129271
Validation loss = 0.0015338662778958678
Validation loss = 0.0014089398318901658
Validation loss = 0.0012838823022320867
Validation loss = 0.0019247938180342317
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001338654081337154
Validation loss = 0.0012216460891067982
Validation loss = 0.0013673934154212475
Validation loss = 0.0012776775984093547
Validation loss = 0.0014390313299372792
Validation loss = 0.0012356890365481377
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013548970455303788
Validation loss = 0.0011529279872775078
Validation loss = 0.001219051773659885
Validation loss = 0.001232852111570537
Validation loss = 0.0014523438876494765
Validation loss = 0.0012414370430633426
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001515147858299315
Validation loss = 0.001487904810346663
Validation loss = 0.001568435225635767
Validation loss = 0.0015784938586875796
Validation loss = 0.0014596767723560333
Validation loss = 0.0013681929558515549
Validation loss = 0.001486123539507389
Validation loss = 0.0017327984096482396
Validation loss = 0.001675150473602116
Validation loss = 0.0015165834920480847
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001592661370523274
Validation loss = 0.0014269204111769795
Validation loss = 0.001629801350645721
Validation loss = 0.001485728658735752
Validation loss = 0.0014919027453288436
Validation loss = 0.00145098811481148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.102   |
| Iteration     | 57       |
| MaximumReturn | -0.0007  |
| MinimumReturn | -0.794   |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025317692197859287
Validation loss = 0.001579373492859304
Validation loss = 0.001954136649146676
Validation loss = 0.001358694164082408
Validation loss = 0.001905576791614294
Validation loss = 0.0012473060050979257
Validation loss = 0.0013635253999382257
Validation loss = 0.0014996407553553581
Validation loss = 0.001383122755214572
Validation loss = 0.0015241970540955663
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013510053977370262
Validation loss = 0.001261068624444306
Validation loss = 0.0013482252834364772
Validation loss = 0.001341646071523428
Validation loss = 0.0015618330799043179
Validation loss = 0.0034404764883220196
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013897294411435723
Validation loss = 0.0012685524998232722
Validation loss = 0.0013280590064823627
Validation loss = 0.001165756955742836
Validation loss = 0.001213389215990901
Validation loss = 0.0011749553959816694
Validation loss = 0.001250672503374517
Validation loss = 0.0016150681767612696
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015133277047425508
Validation loss = 0.001660071313381195
Validation loss = 0.0014762949431315064
Validation loss = 0.0015698994975537062
Validation loss = 0.001221444457769394
Validation loss = 0.001400252804160118
Validation loss = 0.0017148811602964997
Validation loss = 0.0014571193605661392
Validation loss = 0.0013231168268248439
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001940226531587541
Validation loss = 0.0013996984343975782
Validation loss = 0.0014687811490148306
Validation loss = 0.0013505264651030302
Validation loss = 0.001360787427984178
Validation loss = 0.001500864396803081
Validation loss = 0.0015042527811601758
Validation loss = 0.001888772938400507
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.8    |
| Iteration     | 58       |
| MaximumReturn | -0.476   |
| MinimumReturn | -41.4    |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020612527150660753
Validation loss = 0.001429023570381105
Validation loss = 0.0014737885212525725
Validation loss = 0.0015913224779069424
Validation loss = 0.0013769129291176796
Validation loss = 0.0011990421917289495
Validation loss = 0.0012707306304946542
Validation loss = 0.0015870097558945417
Validation loss = 0.0017518693348392844
Validation loss = 0.001942710136063397
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002752946689724922
Validation loss = 0.0025531610008329153
Validation loss = 0.0013670030748471618
Validation loss = 0.0012177422177046537
Validation loss = 0.0015447299228981137
Validation loss = 0.0019582007080316544
Validation loss = 0.001641503069549799
Validation loss = 0.0014775559538975358
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007267558481544256
Validation loss = 0.00146878557279706
Validation loss = 0.0012289002770558
Validation loss = 0.001345211872830987
Validation loss = 0.001211254857480526
Validation loss = 0.0019747137557715178
Validation loss = 0.0017637965502217412
Validation loss = 0.0015563578344881535
Validation loss = 0.0010971794836223125
Validation loss = 0.0011019738158211112
Validation loss = 0.0014196400297805667
Validation loss = 0.0011963207507506013
Validation loss = 0.0022818848956376314
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002823631977662444
Validation loss = 0.0016185855492949486
Validation loss = 0.0015855911187827587
Validation loss = 0.0019664803985506296
Validation loss = 0.002283575013279915
Validation loss = 0.0012328772572800517
Validation loss = 0.001403906149789691
Validation loss = 0.00152633769903332
Validation loss = 0.0016149042639881372
Validation loss = 0.0015632974682375789
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025713646318763494
Validation loss = 0.0014154415111988783
Validation loss = 0.0019795154221355915
Validation loss = 0.002015108009800315
Validation loss = 0.0013651156332343817
Validation loss = 0.0012235805625095963
Validation loss = 0.0014923003036528826
Validation loss = 0.001639995607547462
Validation loss = 0.0013909578556194901
Validation loss = 0.0012267078272998333
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.49    |
| Iteration     | 59       |
| MaximumReturn | -0.00187 |
| MinimumReturn | -71.4    |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012381363194435835
Validation loss = 0.0013649777974933386
Validation loss = 0.0012685213005170226
Validation loss = 0.0014248054940253496
Validation loss = 0.001982741989195347
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014442444080486894
Validation loss = 0.0017182730371132493
Validation loss = 0.001232057111337781
Validation loss = 0.0013023004867136478
Validation loss = 0.0011358729097992182
Validation loss = 0.0016637076623737812
Validation loss = 0.0017447989666834474
Validation loss = 0.001754692755639553
Validation loss = 0.0012663245433941483
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001158265513367951
Validation loss = 0.0012872598599642515
Validation loss = 0.0010079272324219346
Validation loss = 0.0017184107564389706
Validation loss = 0.0010433304123580456
Validation loss = 0.001294837798923254
Validation loss = 0.0010626718867570162
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00179809401743114
Validation loss = 0.0013997936621308327
Validation loss = 0.001538719399832189
Validation loss = 0.0014566159807145596
Validation loss = 0.002041888888925314
Validation loss = 0.0015607699751853943
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001286291517317295
Validation loss = 0.001254159607924521
Validation loss = 0.0014554922236129642
Validation loss = 0.0012699199141934514
Validation loss = 0.001310196821577847
Validation loss = 0.0014714360004290938
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -72.6    |
| Iteration     | 60       |
| MaximumReturn | -0.00169 |
| MinimumReturn | -148     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017234775004908442
Validation loss = 0.0012359721586108208
Validation loss = 0.0014585714088752866
Validation loss = 0.001283304183743894
Validation loss = 0.002015506150200963
Validation loss = 0.0012169568799436092
Validation loss = 0.0012643851805478334
Validation loss = 0.0011864154366776347
Validation loss = 0.001072872313670814
Validation loss = 0.0022024877835065126
Validation loss = 0.0013035539304837584
Validation loss = 0.0012380084954202175
Validation loss = 0.0014784415252506733
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014307970413938165
Validation loss = 0.001278930576518178
Validation loss = 0.0013071805005893111
Validation loss = 0.0015335013158619404
Validation loss = 0.0012429860653355718
Validation loss = 0.0015657177427783608
Validation loss = 0.0012796636437997222
Validation loss = 0.0014532144414260983
Validation loss = 0.0012382450513541698
Validation loss = 0.0010844527278095484
Validation loss = 0.0014286371879279613
Validation loss = 0.0016527678817510605
Validation loss = 0.0013232178753241897
Validation loss = 0.0010832308325916529
Validation loss = 0.001407119445502758
Validation loss = 0.0009519712766632438
Validation loss = 0.0013506070245057344
Validation loss = 0.0019477395107969642
Validation loss = 0.0012290200684219599
Validation loss = 0.0015627116663381457
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012311465106904507
Validation loss = 0.0010385632049292326
Validation loss = 0.0011755742598325014
Validation loss = 0.0011573079973459244
Validation loss = 0.0013245006557554007
Validation loss = 0.001254000118933618
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013273479416966438
Validation loss = 0.0012458316050469875
Validation loss = 0.001485005603171885
Validation loss = 0.0012153866700828075
Validation loss = 0.002256746869534254
Validation loss = 0.001103429589420557
Validation loss = 0.0011819428764283657
Validation loss = 0.0014688875526189804
Validation loss = 0.0011481543770059943
Validation loss = 0.001290367916226387
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013209829339757562
Validation loss = 0.0018600109033286572
Validation loss = 0.0012028118362650275
Validation loss = 0.001170490519143641
Validation loss = 0.0016260111005976796
Validation loss = 0.0014847511192783713
Validation loss = 0.001998716965317726
Validation loss = 0.0015280654188245535
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -139     |
| Iteration     | 61       |
| MaximumReturn | -108     |
| MinimumReturn | -170     |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001534144626930356
Validation loss = 0.001010590000078082
Validation loss = 0.0011672620894387364
Validation loss = 0.0012068998767063022
Validation loss = 0.0014575974782928824
Validation loss = 0.001169610652141273
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014400396030396223
Validation loss = 0.001123510766774416
Validation loss = 0.001109015429392457
Validation loss = 0.0015097411815077066
Validation loss = 0.0010849853279069066
Validation loss = 0.0009414884843863547
Validation loss = 0.0011622265446931124
Validation loss = 0.001185008091852069
Validation loss = 0.001186749548651278
Validation loss = 0.0011783858062699437
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017537276726216078
Validation loss = 0.0010053022997453809
Validation loss = 0.0009546380024403334
Validation loss = 0.0012679174542427063
Validation loss = 0.0011057357769459486
Validation loss = 0.0009763928828760982
Validation loss = 0.001292600529268384
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016841820906847715
Validation loss = 0.0010237377136945724
Validation loss = 0.0012786256847903132
Validation loss = 0.0010137270437553525
Validation loss = 0.0010693167569115758
Validation loss = 0.0010572527535259724
Validation loss = 0.0013048957334831357
Validation loss = 0.0013629792956635356
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014756496530026197
Validation loss = 0.0010813153348863125
Validation loss = 0.001104186405427754
Validation loss = 0.0012254358734935522
Validation loss = 0.001294787973165512
Validation loss = 0.0011525924783200026
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -120     |
| Iteration     | 62       |
| MaximumReturn | -4.92    |
| MinimumReturn | -202     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0031502284109592438
Validation loss = 0.0015681827208027244
Validation loss = 0.001333696534857154
Validation loss = 0.0015494705876335502
Validation loss = 0.0013976565096527338
Validation loss = 0.0017100018449127674
Validation loss = 0.0014913574559614062
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0036836767103523016
Validation loss = 0.0016613397747278214
Validation loss = 0.0016665162984281778
Validation loss = 0.0016572533641010523
Validation loss = 0.0016094555612653494
Validation loss = 0.0025459444150328636
Validation loss = 0.0015965976053848863
Validation loss = 0.002854987047612667
Validation loss = 0.002449691528454423
Validation loss = 0.0021199097391217947
Validation loss = 0.0018025028984993696
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0075325146317481995
Validation loss = 0.001884179306216538
Validation loss = 0.0017001951346173882
Validation loss = 0.0014867369318380952
Validation loss = 0.0019801876042038202
Validation loss = 0.0015721068484708667
Validation loss = 0.002054182579740882
Validation loss = 0.0015740464441478252
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005412502679973841
Validation loss = 0.0018490541260689497
Validation loss = 0.002053982811048627
Validation loss = 0.0018505555344745517
Validation loss = 0.001940092071890831
Validation loss = 0.0015277775237336755
Validation loss = 0.0020435070618987083
Validation loss = 0.001963034737855196
Validation loss = 0.002388543216511607
Validation loss = 0.0014733183197677135
Validation loss = 0.0014580480055883527
Validation loss = 0.0014159224228933454
Validation loss = 0.002089692512527108
Validation loss = 0.0014268358936533332
Validation loss = 0.001827077241614461
Validation loss = 0.001443259883671999
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006384372711181641
Validation loss = 0.002449518069624901
Validation loss = 0.002931068418547511
Validation loss = 0.001826633233577013
Validation loss = 0.002471295651048422
Validation loss = 0.002099695848301053
Validation loss = 0.002239180263131857
Validation loss = 0.001960987690836191
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -78.3    |
| Iteration     | 63       |
| MaximumReturn | -1.32    |
| MinimumReturn | -132     |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0019298613769933581
Validation loss = 0.0015180617338046432
Validation loss = 0.0014513925416395068
Validation loss = 0.0012723550898954272
Validation loss = 0.0018267002888023853
Validation loss = 0.0019214408239349723
Validation loss = 0.0013571650488302112
Validation loss = 0.0017619653372094035
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001464441535063088
Validation loss = 0.001954912906512618
Validation loss = 0.0014392727753147483
Validation loss = 0.0017498412635177374
Validation loss = 0.002079729689285159
Validation loss = 0.00201962748542428
Validation loss = 0.002310284413397312
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017763961805030704
Validation loss = 0.0015473648672923446
Validation loss = 0.0016501381760463119
Validation loss = 0.0018594411667436361
Validation loss = 0.0018434585072100163
Validation loss = 0.00232882983982563
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018671423895284534
Validation loss = 0.0014338060282170773
Validation loss = 0.0018464423483237624
Validation loss = 0.0015953740803524852
Validation loss = 0.0016129649011418223
Validation loss = 0.001635153777897358
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001681512570939958
Validation loss = 0.0017771186539903283
Validation loss = 0.0017961636185646057
Validation loss = 0.001529477653093636
Validation loss = 0.0017439755611121655
Validation loss = 0.0028176831547170877
Validation loss = 0.0015995427966117859
Validation loss = 0.0017191896913573146
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -55.3    |
| Iteration     | 64       |
| MaximumReturn | -1.53    |
| MinimumReturn | -161     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00459732860326767
Validation loss = 0.0014208192005753517
Validation loss = 0.0015180796617642045
Validation loss = 0.0014576002722606063
Validation loss = 0.0014486191794276237
Validation loss = 0.0014471323229372501
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004591247066855431
Validation loss = 0.0016438007587566972
Validation loss = 0.0015557033475488424
Validation loss = 0.0018593429122120142
Validation loss = 0.0018574584973976016
Validation loss = 0.0018423056462779641
Validation loss = 0.0018995802383869886
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0023181920405477285
Validation loss = 0.0011821858352050185
Validation loss = 0.0015090371016412973
Validation loss = 0.0016470440896227956
Validation loss = 0.0018907387275248766
Validation loss = 0.002947691362351179
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002854428254067898
Validation loss = 0.0017230823868885636
Validation loss = 0.00144161784555763
Validation loss = 0.0010794461704790592
Validation loss = 0.0012815826339647174
Validation loss = 0.0012486734194681048
Validation loss = 0.0011769166449084878
Validation loss = 0.001740132225677371
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018421318382024765
Validation loss = 0.0013169735902920365
Validation loss = 0.0010539933573454618
Validation loss = 0.0012932086829096079
Validation loss = 0.0013948200503364205
Validation loss = 0.001400847570039332
Validation loss = 0.0019148787250742316
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -71.8    |
| Iteration     | 65       |
| MaximumReturn | -2.03    |
| MinimumReturn | -200     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0037387125194072723
Validation loss = 0.001806620741263032
Validation loss = 0.0017150391358882189
Validation loss = 0.0011793116573244333
Validation loss = 0.0013900380581617355
Validation loss = 0.0011986144818365574
Validation loss = 0.0014855082845315337
Validation loss = 0.00117677787784487
Validation loss = 0.0011493718484416604
Validation loss = 0.001314088935032487
Validation loss = 0.0011416091583669186
Validation loss = 0.0011760573834180832
Validation loss = 0.0013803745387122035
Validation loss = 0.0015663426602259278
Validation loss = 0.0012356411898508668
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001648818957619369
Validation loss = 0.0011482377303764224
Validation loss = 0.0013164402917027473
Validation loss = 0.0011619824217632413
Validation loss = 0.0008868966833688319
Validation loss = 0.0016303870361298323
Validation loss = 0.0015966591890901327
Validation loss = 0.0012282179668545723
Validation loss = 0.0017940039979293942
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0029930321034044027
Validation loss = 0.0015245839022099972
Validation loss = 0.001688117510639131
Validation loss = 0.0017069346504285932
Validation loss = 0.0012152665294706821
Validation loss = 0.0012980818282812834
Validation loss = 0.001633183564990759
Validation loss = 0.001187867484986782
Validation loss = 0.0014693776611238718
Validation loss = 0.0011537190293893218
Validation loss = 0.0014214582042768598
Validation loss = 0.0014009729493409395
Validation loss = 0.0015255552716553211
Validation loss = 0.0015486418269574642
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002886131638661027
Validation loss = 0.001369813340716064
Validation loss = 0.0008874121704138815
Validation loss = 0.0009229338029399514
Validation loss = 0.0011590378126129508
Validation loss = 0.0009820587001740932
Validation loss = 0.0010056208120658994
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00257054902613163
Validation loss = 0.0016389101510867476
Validation loss = 0.0013113750610500574
Validation loss = 0.0011737082386389375
Validation loss = 0.0011338799959048629
Validation loss = 0.0018437992548570037
Validation loss = 0.0011743787908926606
Validation loss = 0.0011309415567666292
Validation loss = 0.001600510673597455
Validation loss = 0.0014117059763520956
Validation loss = 0.0011301516788080335
Validation loss = 0.0012619789922609925
Validation loss = 0.0011677334550768137
Validation loss = 0.0009406061144545674
Validation loss = 0.0010972607415169477
Validation loss = 0.0010330680524930358
Validation loss = 0.0010993193136528134
Validation loss = 0.0016483133658766747
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.15    |
| Iteration     | 66       |
| MaximumReturn | -0.00111 |
| MinimumReturn | -91.9    |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015434835804626346
Validation loss = 0.0015685156686231494
Validation loss = 0.00162127788644284
Validation loss = 0.001554006477817893
Validation loss = 0.001764676533639431
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014301991323009133
Validation loss = 0.00249614124186337
Validation loss = 0.0017945690779015422
Validation loss = 0.0012840022100135684
Validation loss = 0.002883191453292966
Validation loss = 0.0027020801790058613
Validation loss = 0.0016812772955745459
Validation loss = 0.001669232384301722
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014815664617344737
Validation loss = 0.0014741738559678197
Validation loss = 0.001708118012174964
Validation loss = 0.0017395970644429326
Validation loss = 0.0021083918400108814
Validation loss = 0.0015264794928953052
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014135915553197265
Validation loss = 0.0014313554856926203
Validation loss = 0.0024528540670871735
Validation loss = 0.001451162388548255
Validation loss = 0.0014674318954348564
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025974020827561617
Validation loss = 0.0018022818258032203
Validation loss = 0.0013865112559869885
Validation loss = 0.0019342491868883371
Validation loss = 0.0014775303425267339
Validation loss = 0.002243625232949853
Validation loss = 0.0018431374337524176
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.14    |
| Iteration     | 67       |
| MaximumReturn | -0.00126 |
| MinimumReturn | -129     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018793625058606267
Validation loss = 0.0015715339686721563
Validation loss = 0.001630814396776259
Validation loss = 0.0016793225659057498
Validation loss = 0.0019866882357746363
Validation loss = 0.0012535114074125886
Validation loss = 0.0014645367627963424
Validation loss = 0.0015053222887217999
Validation loss = 0.0015546943759545684
Validation loss = 0.0012641616631299257
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014083803398534656
Validation loss = 0.0014486904256045818
Validation loss = 0.0022775090765208006
Validation loss = 0.0013771853409707546
Validation loss = 0.0014285966753959656
Validation loss = 0.0016792991664260626
Validation loss = 0.0016883618663996458
Validation loss = 0.0015689735300838947
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022457758896052837
Validation loss = 0.0020654615946114063
Validation loss = 0.0019023788627237082
Validation loss = 0.0015109577216207981
Validation loss = 0.0023897262290120125
Validation loss = 0.0018026852048933506
Validation loss = 0.0021552061662077904
Validation loss = 0.0016054161824285984
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013516187900677323
Validation loss = 0.001739447470754385
Validation loss = 0.0014163266168907285
Validation loss = 0.002265363931655884
Validation loss = 0.0019828062504529953
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018166754161939025
Validation loss = 0.0014011074090376496
Validation loss = 0.0018736406927928329
Validation loss = 0.0015634469455108047
Validation loss = 0.0021217300090938807
Validation loss = 0.0021486955229192972
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0757   |
| Iteration     | 68        |
| MaximumReturn | -0.000708 |
| MinimumReturn | -1.02     |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013562925159931183
Validation loss = 0.0017633293755352497
Validation loss = 0.0019720348063856363
Validation loss = 0.0014514235081151128
Validation loss = 0.001605307450518012
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016077032778412104
Validation loss = 0.0015402886783704162
Validation loss = 0.0028866128996014595
Validation loss = 0.002686756895855069
Validation loss = 0.0015978169394657016
Validation loss = 0.001883534830994904
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001727861468680203
Validation loss = 0.001685549970716238
Validation loss = 0.0018065538024529815
Validation loss = 0.0015900542493909597
Validation loss = 0.002088828245177865
Validation loss = 0.0018625784432515502
Validation loss = 0.001953758066520095
Validation loss = 0.0016502815997228026
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001521908794529736
Validation loss = 0.0015630851266905665
Validation loss = 0.0018924043979495764
Validation loss = 0.0015510316006839275
Validation loss = 0.0019108726410195231
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015620727790519595
Validation loss = 0.0015528453513979912
Validation loss = 0.0016378570580855012
Validation loss = 0.002908488502725959
Validation loss = 0.0014015568885952234
Validation loss = 0.0015636071329936385
Validation loss = 0.001707163406535983
Validation loss = 0.00182004040107131
Validation loss = 0.001556435483507812
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -67.3    |
| Iteration     | 69       |
| MaximumReturn | -0.00296 |
| MinimumReturn | -186     |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001501470454968512
Validation loss = 0.0012958410661667585
Validation loss = 0.0015009441412985325
Validation loss = 0.0012899200664833188
Validation loss = 0.0013004271313548088
Validation loss = 0.0028911810368299484
Validation loss = 0.0016794593539088964
Validation loss = 0.0013728979974985123
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017925859428942204
Validation loss = 0.001594534725882113
Validation loss = 0.0016272190259769559
Validation loss = 0.0018924381583929062
Validation loss = 0.001152132055722177
Validation loss = 0.0018531641690060496
Validation loss = 0.002123478101566434
Validation loss = 0.0012904379982501268
Validation loss = 0.0015286734560504556
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019711004570126534
Validation loss = 0.0016621199902147055
Validation loss = 0.0011873319745063782
Validation loss = 0.0012567919911816716
Validation loss = 0.001202476560138166
Validation loss = 0.0012706705601885915
Validation loss = 0.0017090002074837685
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016513093141838908
Validation loss = 0.0015143740456551313
Validation loss = 0.0014078409876674414
Validation loss = 0.0013896982418373227
Validation loss = 0.0014845485566183925
Validation loss = 0.0015995694557204843
Validation loss = 0.0014312146231532097
Validation loss = 0.0014507503947243094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002493000589311123
Validation loss = 0.001429320895113051
Validation loss = 0.0017424438847228885
Validation loss = 0.0027437652461230755
Validation loss = 0.002029647585004568
Validation loss = 0.0015211268328130245
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.3    |
| Iteration     | 70       |
| MaximumReturn | -0.00066 |
| MinimumReturn | -197     |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002007157076150179
Validation loss = 0.0015659065684303641
Validation loss = 0.0018864047015085816
Validation loss = 0.001552657806314528
Validation loss = 0.0014440111117437482
Validation loss = 0.0021252096630632877
Validation loss = 0.0014277633745223284
Validation loss = 0.0014445080887526274
Validation loss = 0.001491866772994399
Validation loss = 0.0018541704630479217
Validation loss = 0.001382908085361123
Validation loss = 0.0012509706430137157
Validation loss = 0.001297352951951325
Validation loss = 0.0014807303668931127
Validation loss = 0.0017594366800040007
Validation loss = 0.001627081772312522
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005437822081148624
Validation loss = 0.0017453123582527041
Validation loss = 0.0016005199868232012
Validation loss = 0.0014213471440598369
Validation loss = 0.0014746219385415316
Validation loss = 0.0012959876330569386
Validation loss = 0.001723315566778183
Validation loss = 0.005255897995084524
Validation loss = 0.002488413592800498
Validation loss = 0.0014188869390636683
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002276425715535879
Validation loss = 0.002712904242798686
Validation loss = 0.0015362791018560529
Validation loss = 0.0018355324864387512
Validation loss = 0.0021314623299986124
Validation loss = 0.001430713920854032
Validation loss = 0.0015124843921512365
Validation loss = 0.0015477758133783937
Validation loss = 0.002540418179705739
Validation loss = 0.0016012144042178988
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002584883477538824
Validation loss = 0.0016029089456424117
Validation loss = 0.00173455651383847
Validation loss = 0.0015460994327440858
Validation loss = 0.0034003700129687786
Validation loss = 0.0016723215812817216
Validation loss = 0.0023427335545420647
Validation loss = 0.0017885982524603605
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018319363007321954
Validation loss = 0.0019032537238672376
Validation loss = 0.0015642325161024928
Validation loss = 0.0016188365407288074
Validation loss = 0.0013685612939298153
Validation loss = 0.001917721820063889
Validation loss = 0.002256660955026746
Validation loss = 0.0018329854356124997
Validation loss = 0.0017471107421442866
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27.5    |
| Iteration     | 71       |
| MaximumReturn | -0.00228 |
| MinimumReturn | -152     |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020400083158165216
Validation loss = 0.0015277353813871741
Validation loss = 0.0013266971800476313
Validation loss = 0.0013233815552666783
Validation loss = 0.0012885895557701588
Validation loss = 0.0019035437144339085
Validation loss = 0.0016371493693441153
Validation loss = 0.0013195212231948972
Validation loss = 0.001281791483052075
Validation loss = 0.001372359343804419
Validation loss = 0.0013861432671546936
Validation loss = 0.001480173785239458
Validation loss = 0.0013002597261220217
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014767225366085768
Validation loss = 0.001370240468531847
Validation loss = 0.0016951579600572586
Validation loss = 0.0017840953078120947
Validation loss = 0.0014704556670039892
Validation loss = 0.0014998757978901267
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014890625607222319
Validation loss = 0.0014102364657446742
Validation loss = 0.0013822470791637897
Validation loss = 0.0020535544026643038
Validation loss = 0.0014653322286903858
Validation loss = 0.0017734734574332833
Validation loss = 0.0023769699037075043
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001622369047254324
Validation loss = 0.0017530485056340694
Validation loss = 0.0017919131787493825
Validation loss = 0.0017396042821928859
Validation loss = 0.0014067430747672915
Validation loss = 0.0012745297281071544
Validation loss = 0.0016501267673447728
Validation loss = 0.003223930485546589
Validation loss = 0.002612773096188903
Validation loss = 0.0017131201457232237
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017827643314376473
Validation loss = 0.0016024098731577396
Validation loss = 0.0018532081739977002
Validation loss = 0.0019343390595167875
Validation loss = 0.00211871019564569
Validation loss = 0.0023476381320506334
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -141     |
| Iteration     | 72       |
| MaximumReturn | -29.3    |
| MinimumReturn | -223     |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016676719533279538
Validation loss = 0.0011268618982285261
Validation loss = 0.0010009206598624587
Validation loss = 0.0011159819550812244
Validation loss = 0.0012637696927413344
Validation loss = 0.001678470871411264
Validation loss = 0.0017135620582848787
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017792333383113146
Validation loss = 0.000993724330328405
Validation loss = 0.0009945989586412907
Validation loss = 0.0017218241700902581
Validation loss = 0.0010747268097475171
Validation loss = 0.0009180707857012749
Validation loss = 0.0010428245877847075
Validation loss = 0.0017058178782463074
Validation loss = 0.0014997890684753656
Validation loss = 0.001406978932209313
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001619851216673851
Validation loss = 0.0011368761770427227
Validation loss = 0.001064174110069871
Validation loss = 0.0020357209723442793
Validation loss = 0.001203591818921268
Validation loss = 0.0010581081733107567
Validation loss = 0.0010237126844003797
Validation loss = 0.0013272591168060899
Validation loss = 0.0016532638110220432
Validation loss = 0.0014520206023007631
Validation loss = 0.0010415486758574843
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014070498291403055
Validation loss = 0.0009333047783002257
Validation loss = 0.001405626069754362
Validation loss = 0.001132699428126216
Validation loss = 0.0013775693951174617
Validation loss = 0.0009921332821249962
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023724997881799936
Validation loss = 0.001414180384017527
Validation loss = 0.0013746584299951792
Validation loss = 0.0018947975477203727
Validation loss = 0.0016320321010425687
Validation loss = 0.0020433778408914804
Validation loss = 0.0030307979322969913
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -150     |
| Iteration     | 73       |
| MaximumReturn | -24.6    |
| MinimumReturn | -218     |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016708176117390394
Validation loss = 0.0009503519395366311
Validation loss = 0.0011632019886747003
Validation loss = 0.0012173388386145234
Validation loss = 0.0010955865727737546
Validation loss = 0.001404342008754611
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015124953351914883
Validation loss = 0.0011898974189534783
Validation loss = 0.0011707199737429619
Validation loss = 0.0018236074829474092
Validation loss = 0.001179538550786674
Validation loss = 0.0011232651304453611
Validation loss = 0.0013818779261782765
Validation loss = 0.0009508765069767833
Validation loss = 0.0015506945783272386
Validation loss = 0.0019281033892184496
Validation loss = 0.0011267386144027114
Validation loss = 0.0013083445373922586
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019416618160903454
Validation loss = 0.0012803188292309642
Validation loss = 0.0012465259060263634
Validation loss = 0.0011602694867178798
Validation loss = 0.0011022003600373864
Validation loss = 0.001191754941828549
Validation loss = 0.0014462476829066873
Validation loss = 0.001131107797846198
Validation loss = 0.0013953779125586152
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019921797793358564
Validation loss = 0.0013683538418263197
Validation loss = 0.0013877085875719786
Validation loss = 0.0016823880141600966
Validation loss = 0.0012964626075699925
Validation loss = 0.0016154011245816946
Validation loss = 0.0009197078761644661
Validation loss = 0.001465733745135367
Validation loss = 0.0011325187515467405
Validation loss = 0.0012074144324287772
Validation loss = 0.001187122892588377
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023992012720555067
Validation loss = 0.0014056413201615214
Validation loss = 0.0013336780248209834
Validation loss = 0.0014084746362641454
Validation loss = 0.0011252553667873144
Validation loss = 0.0016816292190924287
Validation loss = 0.0013360109878703952
Validation loss = 0.0015317288925871253
Validation loss = 0.001326278317719698
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -75.7    |
| Iteration     | 74       |
| MaximumReturn | -0.234   |
| MinimumReturn | -178     |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001632039900869131
Validation loss = 0.0011199561413377523
Validation loss = 0.002154333982616663
Validation loss = 0.0010066952090710402
Validation loss = 0.0015430985949933529
Validation loss = 0.0013466733507812023
Validation loss = 0.001090453821234405
Validation loss = 0.0017070239409804344
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011425933334976435
Validation loss = 0.001050477148965001
Validation loss = 0.0011858086800202727
Validation loss = 0.0009973493870347738
Validation loss = 0.0012387492461130023
Validation loss = 0.0014290218241512775
Validation loss = 0.0012285653501749039
Validation loss = 0.0013959561474621296
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014003196265548468
Validation loss = 0.0013030993286520243
Validation loss = 0.0010893713915720582
Validation loss = 0.0008951431955210865
Validation loss = 0.0011093912180513144
Validation loss = 0.001031111809425056
Validation loss = 0.0012376653030514717
Validation loss = 0.0009616001625545323
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010869582183659077
Validation loss = 0.0014086293522268534
Validation loss = 0.0012516665738075972
Validation loss = 0.0014122374122962356
Validation loss = 0.001614898326806724
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011556027457118034
Validation loss = 0.001645456301048398
Validation loss = 0.0012720668455585837
Validation loss = 0.001338502741418779
Validation loss = 0.0014198359567672014
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -191     |
| Iteration     | 75       |
| MaximumReturn | -78.5    |
| MinimumReturn | -228     |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0026363078504800797
Validation loss = 0.001284091267734766
Validation loss = 0.0014976797392591834
Validation loss = 0.0013370625674724579
Validation loss = 0.0014669080264866352
Validation loss = 0.0014664176851511002
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016429924871772528
Validation loss = 0.0015846827300265431
Validation loss = 0.0015410699415951967
Validation loss = 0.0017526975134387612
Validation loss = 0.0014330390840768814
Validation loss = 0.001165216090157628
Validation loss = 0.0012077218852937222
Validation loss = 0.0015428339829668403
Validation loss = 0.0011657639406621456
Validation loss = 0.0011778543703258038
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002815455896779895
Validation loss = 0.0012553749838843942
Validation loss = 0.0025614495389163494
Validation loss = 0.0012550553074106574
Validation loss = 0.0018041946459561586
Validation loss = 0.0009407809702679515
Validation loss = 0.0012027673656120896
Validation loss = 0.0011912769405171275
Validation loss = 0.0012289066798985004
Validation loss = 0.0012514748377725482
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0020308915991336107
Validation loss = 0.001543448306620121
Validation loss = 0.0013087787665426731
Validation loss = 0.0017289035022258759
Validation loss = 0.0013088996056467295
Validation loss = 0.0015538777224719524
Validation loss = 0.0011200429871678352
Validation loss = 0.0014208836946636438
Validation loss = 0.0015149415703490376
Validation loss = 0.0017629514914005995
Validation loss = 0.0011646854691207409
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020459950901567936
Validation loss = 0.0016174297779798508
Validation loss = 0.0015419056871905923
Validation loss = 0.0016162559622898698
Validation loss = 0.0016466272063553333
Validation loss = 0.001626446028240025
Validation loss = 0.0011030371533706784
Validation loss = 0.0011572764487937093
Validation loss = 0.0013001237530261278
Validation loss = 0.00116346450522542
Validation loss = 0.001446619862690568
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -15.1    |
| Iteration     | 76       |
| MaximumReturn | -0.486   |
| MinimumReturn | -191     |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014334061415866017
Validation loss = 0.0013672359054908156
Validation loss = 0.0014961326960474253
Validation loss = 0.0015879948623478413
Validation loss = 0.0015510298544541001
Validation loss = 0.0017343491781502962
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001210319809615612
Validation loss = 0.0017239198787137866
Validation loss = 0.0011329075787216425
Validation loss = 0.0016616659704595804
Validation loss = 0.0012157075107097626
Validation loss = 0.0013997002970427275
Validation loss = 0.0015387639869004488
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001666763098910451
Validation loss = 0.0016581711824983358
Validation loss = 0.00424941536039114
Validation loss = 0.0011507500894367695
Validation loss = 0.001723470282740891
Validation loss = 0.001532029826194048
Validation loss = 0.0016865471843630075
Validation loss = 0.0018322195392102003
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013243281282484531
Validation loss = 0.0014051650650799274
Validation loss = 0.0015549777308478951
Validation loss = 0.0020934666972607374
Validation loss = 0.0014285346260294318
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001385300885885954
Validation loss = 0.0012719110818579793
Validation loss = 0.0017138655530288815
Validation loss = 0.0014446239219978452
Validation loss = 0.0012661087093874812
Validation loss = 0.0017725895158946514
Validation loss = 0.0015132981352508068
Validation loss = 0.001389761338941753
Validation loss = 0.0012654972961172462
Validation loss = 0.001280137337744236
Validation loss = 0.001292671076953411
Validation loss = 0.0011334732407703996
Validation loss = 0.0016060795169323683
Validation loss = 0.001328709302470088
Validation loss = 0.0015390366315841675
Validation loss = 0.0016535818576812744
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -65.5    |
| Iteration     | 77       |
| MaximumReturn | -0.84    |
| MinimumReturn | -208     |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011270531686022878
Validation loss = 0.0012984087225049734
Validation loss = 0.0011223400942981243
Validation loss = 0.0013482450740411878
Validation loss = 0.0012028304627165198
Validation loss = 0.0011231217067688704
Validation loss = 0.0021974164992570877
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009797470411285758
Validation loss = 0.0011735744774341583
Validation loss = 0.0019178306683897972
Validation loss = 0.0012208126718178391
Validation loss = 0.0012645946117118
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016455842414870858
Validation loss = 0.0011529552284628153
Validation loss = 0.0010761357843875885
Validation loss = 0.0011141668073832989
Validation loss = 0.0013385064667090774
Validation loss = 0.0013983146054670215
Validation loss = 0.0013891465496271849
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013915721792727709
Validation loss = 0.001072416314855218
Validation loss = 0.001244626590050757
Validation loss = 0.001317600137554109
Validation loss = 0.0013236859813332558
Validation loss = 0.0011784943053498864
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019418784650042653
Validation loss = 0.0013413517735898495
Validation loss = 0.0012069481890648603
Validation loss = 0.001098209759220481
Validation loss = 0.0014702363405376673
Validation loss = 0.0009869297500699759
Validation loss = 0.0012897771084681153
Validation loss = 0.001161955646239221
Validation loss = 0.0011494968784973025
Validation loss = 0.001675681211054325
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23      |
| Iteration     | 78       |
| MaximumReturn | -1.09    |
| MinimumReturn | -157     |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012107266811653972
Validation loss = 0.0011062180856242776
Validation loss = 0.002325704088434577
Validation loss = 0.0014338036999106407
Validation loss = 0.00097663386259228
Validation loss = 0.0012444911990314722
Validation loss = 0.0011120398994535208
Validation loss = 0.002118578879162669
Validation loss = 0.0015809916658326983
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001139234402216971
Validation loss = 0.00225116778165102
Validation loss = 0.0011485485592857003
Validation loss = 0.0011831704759970307
Validation loss = 0.0014515351504087448
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009661110234446824
Validation loss = 0.0011546555906534195
Validation loss = 0.0012461619917303324
Validation loss = 0.001782805658876896
Validation loss = 0.0014609452337026596
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014547544997185469
Validation loss = 0.001585302408784628
Validation loss = 0.0012382165296003222
Validation loss = 0.0013632337795570493
Validation loss = 0.0013314997777342796
Validation loss = 0.0011835028417408466
Validation loss = 0.0011287401430308819
Validation loss = 0.0025053508579730988
Validation loss = 0.0015507987700402737
Validation loss = 0.0014144042506814003
Validation loss = 0.0010134492767974734
Validation loss = 0.001404771232046187
Validation loss = 0.001774748438037932
Validation loss = 0.001422296860255301
Validation loss = 0.0016013518907129765
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001183993648737669
Validation loss = 0.0012867187615484
Validation loss = 0.0013010133989155293
Validation loss = 0.0015528823714703321
Validation loss = 0.0011521882843226194
Validation loss = 0.0019638107623904943
Validation loss = 0.0011474586790427566
Validation loss = 0.0012436701217666268
Validation loss = 0.002042739884927869
Validation loss = 0.001397325424477458
Validation loss = 0.001227634260430932
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -75.3    |
| Iteration     | 79       |
| MaximumReturn | -0.852   |
| MinimumReturn | -190     |
| TotalSamples  | 134946   |
----------------------------
