Logging to experiments/invertedPendulum/IPA01/Tue-01-Nov-2022-07-59-07-PM-CDT_invertedPendulum_trpo_iteration_20_seed3214
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5628118515014648
Validation loss = 0.26449862122535706
Validation loss = 0.25347235798835754
Validation loss = 0.2195539027452469
Validation loss = 0.1967136710882187
Validation loss = 0.19125132262706757
Validation loss = 0.18491791188716888
Validation loss = 0.17920108139514923
Validation loss = 0.16284529864788055
Validation loss = 0.15521681308746338
Validation loss = 0.1507362425327301
Validation loss = 0.1412157267332077
Validation loss = 0.13689012825489044
Validation loss = 0.14262966811656952
Validation loss = 0.13537012040615082
Validation loss = 0.12785713374614716
Validation loss = 0.13576337695121765
Validation loss = 0.11796003580093384
Validation loss = 0.12267264723777771
Validation loss = 0.11892940104007721
Validation loss = 0.10511454939842224
Validation loss = 0.10177583247423172
Validation loss = 0.09731585532426834
Validation loss = 0.08996856212615967
Validation loss = 0.1097959503531456
Validation loss = 0.08756475895643234
Validation loss = 0.09257262200117111
Validation loss = 0.08930732309818268
Validation loss = 0.09885898977518082
Validation loss = 0.09619849920272827
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5542848706245422
Validation loss = 0.26304951310157776
Validation loss = 0.24644993245601654
Validation loss = 0.21461674571037292
Validation loss = 0.1980758011341095
Validation loss = 0.19709579646587372
Validation loss = 0.18018196523189545
Validation loss = 0.1721019446849823
Validation loss = 0.17032071948051453
Validation loss = 0.17086392641067505
Validation loss = 0.15225329995155334
Validation loss = 0.1452322006225586
Validation loss = 0.13783429563045502
Validation loss = 0.1486692577600479
Validation loss = 0.14239351451396942
Validation loss = 0.1364879608154297
Validation loss = 0.14320975542068481
Validation loss = 0.12805309891700745
Validation loss = 0.1132253110408783
Validation loss = 0.10892277210950851
Validation loss = 0.1122574731707573
Validation loss = 0.10887181013822556
Validation loss = 0.10018867999315262
Validation loss = 0.09769520908594131
Validation loss = 0.0955735594034195
Validation loss = 0.08770158886909485
Validation loss = 0.09446954727172852
Validation loss = 0.08723748475313187
Validation loss = 0.08563820272684097
Validation loss = 0.0861714705824852
Validation loss = 0.083811916410923
Validation loss = 0.07626273483037949
Validation loss = 0.07445564866065979
Validation loss = 0.0742318257689476
Validation loss = 0.07479915767908096
Validation loss = 0.06963762640953064
Validation loss = 0.07388025522232056
Validation loss = 0.06675881892442703
Validation loss = 0.07372074574232101
Validation loss = 0.062149446457624435
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5358873009681702
Validation loss = 0.3025084435939789
Validation loss = 0.24687224626541138
Validation loss = 0.22071370482444763
Validation loss = 0.19863006472587585
Validation loss = 0.19504769146442413
Validation loss = 0.18255698680877686
Validation loss = 0.17319481074810028
Validation loss = 0.16424882411956787
Validation loss = 0.16714760661125183
Validation loss = 0.14611850678920746
Validation loss = 0.1508243829011917
Validation loss = 0.14127102494239807
Validation loss = 0.13583765923976898
Validation loss = 0.1340482383966446
Validation loss = 0.1321694254875183
Validation loss = 0.13147318363189697
Validation loss = 0.11575441062450409
Validation loss = 0.11848492920398712
Validation loss = 0.11079928278923035
Validation loss = 0.10242380946874619
Validation loss = 0.10559491068124771
Validation loss = 0.09277935326099396
Validation loss = 0.08054129779338837
Validation loss = 0.08334735035896301
Validation loss = 0.0863158255815506
Validation loss = 0.08637659251689911
Validation loss = 0.08178874105215073
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5534652471542358
Validation loss = 0.2505504786968231
Validation loss = 0.23357081413269043
Validation loss = 0.21312697231769562
Validation loss = 0.1931561976671219
Validation loss = 0.18737582862377167
Validation loss = 0.17053645849227905
Validation loss = 0.1688074916601181
Validation loss = 0.1543423980474472
Validation loss = 0.15302202105522156
Validation loss = 0.14473555982112885
Validation loss = 0.13974663615226746
Validation loss = 0.12980568408966064
Validation loss = 0.1340438723564148
Validation loss = 0.13156430423259735
Validation loss = 0.12866853177547455
Validation loss = 0.13916948437690735
Validation loss = 0.11545895785093307
Validation loss = 0.10743243992328644
Validation loss = 0.11748158186674118
Validation loss = 0.10189260542392731
Validation loss = 0.10389772802591324
Validation loss = 0.0963788628578186
Validation loss = 0.08879126608371735
Validation loss = 0.09500980377197266
Validation loss = 0.0940437838435173
Validation loss = 0.08662158995866776
Validation loss = 0.0898292064666748
Validation loss = 0.08774123340845108
Validation loss = 0.0876777172088623
Validation loss = 0.07930365949869156
Validation loss = 0.07337068021297455
Validation loss = 0.07005959004163742
Validation loss = 0.07319018244743347
Validation loss = 0.07021165639162064
Validation loss = 0.07187113910913467
Validation loss = 0.06511741131544113
Validation loss = 0.06164966896176338
Validation loss = 0.06736894696950912
Validation loss = 0.06849007308483124
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5546165704727173
Validation loss = 0.2519068419933319
Validation loss = 0.24903547763824463
Validation loss = 0.21728219091892242
Validation loss = 0.20306570827960968
Validation loss = 0.18096806108951569
Validation loss = 0.17247237265110016
Validation loss = 0.16788041591644287
Validation loss = 0.17158125340938568
Validation loss = 0.16364574432373047
Validation loss = 0.180871844291687
Validation loss = 0.15081153810024261
Validation loss = 0.14114494621753693
Validation loss = 0.13998065888881683
Validation loss = 0.13592247664928436
Validation loss = 0.13036590814590454
Validation loss = 0.12981224060058594
Validation loss = 0.11607535928487778
Validation loss = 0.12224525958299637
Validation loss = 0.12314349412918091
Validation loss = 0.10466733574867249
Validation loss = 0.1090751513838768
Validation loss = 0.1108999028801918
Validation loss = 0.0918726772069931
Validation loss = 0.086369588971138
Validation loss = 0.08695770055055618
Validation loss = 0.08508477360010147
Validation loss = 0.08014016598463058
Validation loss = 0.08290867507457733
Validation loss = 0.07251571863889694
Validation loss = 0.08011911064386368
Validation loss = 0.07668054848909378
Validation loss = 0.07129274308681488
Validation loss = 0.0700000450015068
Validation loss = 0.06797757744789124
Validation loss = 0.06058407202363014
Validation loss = 0.06669364124536514
Validation loss = 0.06131412461400032
Validation loss = 0.07272889465093613
Validation loss = 0.08168359845876694
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.35    |
| Iteration     | 0        |
| MaximumReturn | -0.0346  |
| MinimumReturn | -40.4    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.19591546058654785
Validation loss = 0.12250377237796783
Validation loss = 0.08827429264783859
Validation loss = 0.08926518261432648
Validation loss = 0.07770708203315735
Validation loss = 0.07007214426994324
Validation loss = 0.06777720898389816
Validation loss = 0.06404704600572586
Validation loss = 0.058267973363399506
Validation loss = 0.06178610399365425
Validation loss = 0.05563811957836151
Validation loss = 0.05241132527589798
Validation loss = 0.0577092319726944
Validation loss = 0.0541791170835495
Validation loss = 0.04693376645445824
Validation loss = 0.04935029149055481
Validation loss = 0.05595017597079277
Validation loss = 0.05245274305343628
Validation loss = 0.05111163854598999
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.24747256934642792
Validation loss = 0.13637661933898926
Validation loss = 0.09435562044382095
Validation loss = 0.0857703760266304
Validation loss = 0.08235599845647812
Validation loss = 0.07590963691473007
Validation loss = 0.06761646270751953
Validation loss = 0.06690752506256104
Validation loss = 0.059147536754608154
Validation loss = 0.06677193939685822
Validation loss = 0.059846989810466766
Validation loss = 0.05500826612114906
Validation loss = 0.05279109999537468
Validation loss = 0.07078074663877487
Validation loss = 0.05157992243766785
Validation loss = 0.05317676439881325
Validation loss = 0.05371683090925217
Validation loss = 0.06005530804395676
Validation loss = 0.05495191738009453
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.25795412063598633
Validation loss = 0.12950383126735687
Validation loss = 0.10215980559587479
Validation loss = 0.08897192031145096
Validation loss = 0.08286936581134796
Validation loss = 0.07782696187496185
Validation loss = 0.06311654299497604
Validation loss = 0.06805198639631271
Validation loss = 0.06071755290031433
Validation loss = 0.05401019752025604
Validation loss = 0.05676976591348648
Validation loss = 0.05160372331738472
Validation loss = 0.05190178006887436
Validation loss = 0.05818376690149307
Validation loss = 0.04915187880396843
Validation loss = 0.04510834440588951
Validation loss = 0.05046963319182396
Validation loss = 0.048230450600385666
Validation loss = 0.06410930305719376
Validation loss = 0.05168435350060463
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.26545315980911255
Validation loss = 0.13926060497760773
Validation loss = 0.11050160229206085
Validation loss = 0.09660165011882782
Validation loss = 0.08638130873441696
Validation loss = 0.07867030054330826
Validation loss = 0.07663711905479431
Validation loss = 0.07294987142086029
Validation loss = 0.05926286801695824
Validation loss = 0.06800346076488495
Validation loss = 0.058120809495449066
Validation loss = 0.058372680097818375
Validation loss = 0.061277568340301514
Validation loss = 0.05901486799120903
Validation loss = 0.055782392621040344
Validation loss = 0.05198225378990173
Validation loss = 0.049186281859874725
Validation loss = 0.046830661594867706
Validation loss = 0.04758143797516823
Validation loss = 0.04775279015302658
Validation loss = 0.04979672655463219
Validation loss = 0.05332590267062187
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2536463141441345
Validation loss = 0.13763201236724854
Validation loss = 0.11261121183633804
Validation loss = 0.1001424714922905
Validation loss = 0.0856664627790451
Validation loss = 0.08407960832118988
Validation loss = 0.0725056529045105
Validation loss = 0.07829172164201736
Validation loss = 0.06870400905609131
Validation loss = 0.06543304771184921
Validation loss = 0.07080989331007004
Validation loss = 0.059029124677181244
Validation loss = 0.05487540736794472
Validation loss = 0.05618024617433548
Validation loss = 0.058261752128601074
Validation loss = 0.06880290806293488
Validation loss = 0.06007008999586105
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14      |
| Iteration     | 1        |
| MaximumReturn | -0.0715  |
| MinimumReturn | -44.6    |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.17707586288452148
Validation loss = 0.09746789932250977
Validation loss = 0.07968058437108994
Validation loss = 0.07035171240568161
Validation loss = 0.061281926929950714
Validation loss = 0.05697086825966835
Validation loss = 0.054400138556957245
Validation loss = 0.04803348332643509
Validation loss = 0.04952005296945572
Validation loss = 0.04534691199660301
Validation loss = 0.04257136583328247
Validation loss = 0.048339489847421646
Validation loss = 0.04289230704307556
Validation loss = 0.041678596287965775
Validation loss = 0.035672903060913086
Validation loss = 0.03282756358385086
Validation loss = 0.039915964007377625
Validation loss = 0.030648790299892426
Validation loss = 0.030746590346097946
Validation loss = 0.03673825412988663
Validation loss = 0.03467904403805733
Validation loss = 0.03103560023009777
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16585198044776917
Validation loss = 0.08710189163684845
Validation loss = 0.07445096224546432
Validation loss = 0.07127353549003601
Validation loss = 0.05794232711195946
Validation loss = 0.05912286415696144
Validation loss = 0.060140807181596756
Validation loss = 0.06031864508986473
Validation loss = 0.055952370166778564
Validation loss = 0.05552460625767708
Validation loss = 0.04260234907269478
Validation loss = 0.03616702929139137
Validation loss = 0.03668315336108208
Validation loss = 0.039090320467948914
Validation loss = 0.038345176726579666
Validation loss = 0.0376335084438324
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1552026867866516
Validation loss = 0.09357717633247375
Validation loss = 0.06842458993196487
Validation loss = 0.06170576065778732
Validation loss = 0.05821150541305542
Validation loss = 0.047240275889635086
Validation loss = 0.05027762055397034
Validation loss = 0.049327656626701355
Validation loss = 0.07309438288211823
Validation loss = 0.05242985859513283
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.18700024485588074
Validation loss = 0.10278619825839996
Validation loss = 0.08593254536390305
Validation loss = 0.07438462972640991
Validation loss = 0.06828881055116653
Validation loss = 0.05891314148902893
Validation loss = 0.06280402839183807
Validation loss = 0.05269310623407364
Validation loss = 0.04416549578309059
Validation loss = 0.04943237826228142
Validation loss = 0.0492093451321125
Validation loss = 0.0387406125664711
Validation loss = 0.036451246589422226
Validation loss = 0.03097495064139366
Validation loss = 0.03362491726875305
Validation loss = 0.04609055817127228
Validation loss = 0.03331468626856804
Validation loss = 0.04242650792002678
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1696474850177765
Validation loss = 0.0895322635769844
Validation loss = 0.06749369949102402
Validation loss = 0.05723593384027481
Validation loss = 0.050878141075372696
Validation loss = 0.04859998822212219
Validation loss = 0.04259771853685379
Validation loss = 0.055410079658031464
Validation loss = 0.046083640307188034
Validation loss = 0.039214737713336945
Validation loss = 0.04929120093584061
Validation loss = 0.035081129521131516
Validation loss = 0.040912166237831116
Validation loss = 0.0522073358297348
Validation loss = 0.0353231281042099
Validation loss = 0.03904588893055916
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.147   |
| Iteration     | 2        |
| MaximumReturn | -0.09    |
| MinimumReturn | -0.28    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06254426389932632
Validation loss = 0.050919417291879654
Validation loss = 0.045423608273267746
Validation loss = 0.030923284590244293
Validation loss = 0.027815913781523705
Validation loss = 0.023531867191195488
Validation loss = 0.023200005292892456
Validation loss = 0.02301074005663395
Validation loss = 0.0296719279140234
Validation loss = 0.02217136323451996
Validation loss = 0.021024854853749275
Validation loss = 0.02437756210565567
Validation loss = 0.024671025574207306
Validation loss = 0.020226052030920982
Validation loss = 0.021424056962132454
Validation loss = 0.021240288391709328
Validation loss = 0.02095440961420536
Validation loss = 0.022925525903701782
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05646561086177826
Validation loss = 0.03203320875763893
Validation loss = 0.024899514392018318
Validation loss = 0.02858230657875538
Validation loss = 0.0238747987896204
Validation loss = 0.022873923182487488
Validation loss = 0.024611076340079308
Validation loss = 0.0215839222073555
Validation loss = 0.023076126351952553
Validation loss = 0.028122251853346825
Validation loss = 0.023741165176033974
Validation loss = 0.020493604242801666
Validation loss = 0.018295323476195335
Validation loss = 0.019274162128567696
Validation loss = 0.019083719700574875
Validation loss = 0.02121712826192379
Validation loss = 0.020795097574591637
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06824298948049545
Validation loss = 0.03344328701496124
Validation loss = 0.031245404854416847
Validation loss = 0.030286645516753197
Validation loss = 0.02446451038122177
Validation loss = 0.025263622403144836
Validation loss = 0.028183750808238983
Validation loss = 0.02463715337216854
Validation loss = 0.020464977249503136
Validation loss = 0.020512396469712257
Validation loss = 0.02142435871064663
Validation loss = 0.02048063650727272
Validation loss = 0.01967000775039196
Validation loss = 0.022602586075663567
Validation loss = 0.019191108644008636
Validation loss = 0.01923101767897606
Validation loss = 0.021720582619309425
Validation loss = 0.026223210617899895
Validation loss = 0.021716156974434853
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06290765851736069
Validation loss = 0.030233511701226234
Validation loss = 0.028356455266475677
Validation loss = 0.02618102729320526
Validation loss = 0.026019493117928505
Validation loss = 0.020722320303320885
Validation loss = 0.019998518750071526
Validation loss = 0.020847052335739136
Validation loss = 0.01796575076878071
Validation loss = 0.020392555743455887
Validation loss = 0.020300040021538734
Validation loss = 0.03148926794528961
Validation loss = 0.018315818160772324
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07663340121507645
Validation loss = 0.029794877395033836
Validation loss = 0.029943453148007393
Validation loss = 0.02457108534872532
Validation loss = 0.026922844350337982
Validation loss = 0.02249922789633274
Validation loss = 0.019903115928173065
Validation loss = 0.02067641168832779
Validation loss = 0.02290714718401432
Validation loss = 0.019442740827798843
Validation loss = 0.019732791930437088
Validation loss = 0.01763274148106575
Validation loss = 0.021487310528755188
Validation loss = 0.01918843202292919
Validation loss = 0.023138582706451416
Validation loss = 0.021405987441539764
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0364  |
| Iteration     | 3        |
| MaximumReturn | -0.0177  |
| MinimumReturn | -0.0586  |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05522296950221062
Validation loss = 0.022025689482688904
Validation loss = 0.020946815609931946
Validation loss = 0.021982988342642784
Validation loss = 0.016866914927959442
Validation loss = 0.01636337675154209
Validation loss = 0.017190946266055107
Validation loss = 0.01627928763628006
Validation loss = 0.021596500650048256
Validation loss = 0.01701558567583561
Validation loss = 0.022840509191155434
Validation loss = 0.016371246427297592
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.056893810629844666
Validation loss = 0.024108031764626503
Validation loss = 0.020413195714354515
Validation loss = 0.027135467156767845
Validation loss = 0.022135699167847633
Validation loss = 0.017757054418325424
Validation loss = 0.019577592611312866
Validation loss = 0.018548356369137764
Validation loss = 0.01861886866390705
Validation loss = 0.01658792234957218
Validation loss = 0.022144483402371407
Validation loss = 0.016723889857530594
Validation loss = 0.0156019888818264
Validation loss = 0.020250722765922546
Validation loss = 0.014542262069880962
Validation loss = 0.015801314264535904
Validation loss = 0.015840502455830574
Validation loss = 0.017153264954686165
Validation loss = 0.015020638704299927
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03852715343236923
Validation loss = 0.026177184656262398
Validation loss = 0.018931463360786438
Validation loss = 0.01708092913031578
Validation loss = 0.02126094326376915
Validation loss = 0.0224617850035429
Validation loss = 0.023772556334733963
Validation loss = 0.01778011955320835
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03652084246277809
Validation loss = 0.029027819633483887
Validation loss = 0.023188916966319084
Validation loss = 0.021642785519361496
Validation loss = 0.018488792702555656
Validation loss = 0.015472148545086384
Validation loss = 0.01606561243534088
Validation loss = 0.01905895210802555
Validation loss = 0.02857116609811783
Validation loss = 0.01963932253420353
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029226837679743767
Validation loss = 0.01975526101887226
Validation loss = 0.021123139187693596
Validation loss = 0.018813060596585274
Validation loss = 0.016399959102272987
Validation loss = 0.016962984576821327
Validation loss = 0.016618777066469193
Validation loss = 0.01722482219338417
Validation loss = 0.016032438725233078
Validation loss = 0.020405560731887817
Validation loss = 0.01822592504322529
Validation loss = 0.02171800471842289
Validation loss = 0.01760902814567089
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.8    |
| Iteration     | 4        |
| MaximumReturn | -4.15    |
| MinimumReturn | -36.5    |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05853358656167984
Validation loss = 0.019465971738100052
Validation loss = 0.01839720644056797
Validation loss = 0.02081415243446827
Validation loss = 0.01538812555372715
Validation loss = 0.014572009444236755
Validation loss = 0.014059707522392273
Validation loss = 0.020832573994994164
Validation loss = 0.016610778868198395
Validation loss = 0.015101881697773933
Validation loss = 0.021228743717074394
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04088319092988968
Validation loss = 0.024947000667452812
Validation loss = 0.015899190679192543
Validation loss = 0.01837714947760105
Validation loss = 0.01520543359220028
Validation loss = 0.013217712752521038
Validation loss = 0.019324535503983498
Validation loss = 0.0138265211135149
Validation loss = 0.013955458998680115
Validation loss = 0.013217635452747345
Validation loss = 0.012299801222980022
Validation loss = 0.012275881133973598
Validation loss = 0.010529190301895142
Validation loss = 0.011184651404619217
Validation loss = 0.017034169286489487
Validation loss = 0.026348263025283813
Validation loss = 0.01708946004509926
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04649250954389572
Validation loss = 0.027467217296361923
Validation loss = 0.024439232423901558
Validation loss = 0.016267338767647743
Validation loss = 0.022061916068196297
Validation loss = 0.016438612714409828
Validation loss = 0.013491151854395866
Validation loss = 0.017436299473047256
Validation loss = 0.021590322256088257
Validation loss = 0.013695137575268745
Validation loss = 0.013482603244483471
Validation loss = 0.01246224157512188
Validation loss = 0.012251824140548706
Validation loss = 0.013171372935175896
Validation loss = 0.011637905612587929
Validation loss = 0.013238278217613697
Validation loss = 0.0133205009624362
Validation loss = 0.013481736183166504
Validation loss = 0.01254718191921711
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03611146658658981
Validation loss = 0.016038116067647934
Validation loss = 0.014641160145401955
Validation loss = 0.013140015304088593
Validation loss = 0.014157837256789207
Validation loss = 0.016825031489133835
Validation loss = 0.016005640849471092
Validation loss = 0.018360668793320656
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.034397024661302567
Validation loss = 0.02041715756058693
Validation loss = 0.022268477827310562
Validation loss = 0.015224749222397804
Validation loss = 0.018474549055099487
Validation loss = 0.013967310078442097
Validation loss = 0.017447752878069878
Validation loss = 0.012366309762001038
Validation loss = 0.01329053658992052
Validation loss = 0.010810812935233116
Validation loss = 0.011058217845857143
Validation loss = 0.013624653220176697
Validation loss = 0.025117313489317894
Validation loss = 0.011740345507860184
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.156   |
| Iteration     | 5        |
| MaximumReturn | -0.0976  |
| MinimumReturn | -0.226   |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029220590367913246
Validation loss = 0.01794397085905075
Validation loss = 0.012735540047287941
Validation loss = 0.012243106961250305
Validation loss = 0.014308084733784199
Validation loss = 0.014188826084136963
Validation loss = 0.011285982094705105
Validation loss = 0.011812164448201656
Validation loss = 0.01238350011408329
Validation loss = 0.016110166907310486
Validation loss = 0.020738735795021057
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.029682736843824387
Validation loss = 0.015459062531590462
Validation loss = 0.013420894742012024
Validation loss = 0.010556304827332497
Validation loss = 0.012642346322536469
Validation loss = 0.0125863803550601
Validation loss = 0.010075641795992851
Validation loss = 0.010494718328118324
Validation loss = 0.010878468863666058
Validation loss = 0.014010327868163586
Validation loss = 0.011014060117304325
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02710788883268833
Validation loss = 0.02311113476753235
Validation loss = 0.013780990615487099
Validation loss = 0.011007522232830524
Validation loss = 0.011365639045834541
Validation loss = 0.01088169775903225
Validation loss = 0.013554669916629791
Validation loss = 0.01192960049957037
Validation loss = 0.011583006009459496
Validation loss = 0.012905189767479897
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04060463234782219
Validation loss = 0.018205158412456512
Validation loss = 0.012490473687648773
Validation loss = 0.011522035114467144
Validation loss = 0.020702190697193146
Validation loss = 0.013059625402092934
Validation loss = 0.01525123417377472
Validation loss = 0.01279444433748722
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03957248479127884
Validation loss = 0.011360990814864635
Validation loss = 0.01077493466436863
Validation loss = 0.012446519918739796
Validation loss = 0.014613683335483074
Validation loss = 0.016963595524430275
Validation loss = 0.0113641656935215
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00413 |
| Iteration     | 6        |
| MaximumReturn | -0.00309 |
| MinimumReturn | -0.00547 |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021218212321400642
Validation loss = 0.0217258483171463
Validation loss = 0.013863329775631428
Validation loss = 0.013037863187491894
Validation loss = 0.02044072188436985
Validation loss = 0.013381384313106537
Validation loss = 0.018312031403183937
Validation loss = 0.01144567783921957
Validation loss = 0.011845811270177364
Validation loss = 0.010056880302727222
Validation loss = 0.011172899045050144
Validation loss = 0.012678141705691814
Validation loss = 0.010166834108531475
Validation loss = 0.010874641127884388
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.027524800971150398
Validation loss = 0.01762574352324009
Validation loss = 0.017351659014821053
Validation loss = 0.011629194021224976
Validation loss = 0.010508463717997074
Validation loss = 0.011059810407459736
Validation loss = 0.01055350061506033
Validation loss = 0.023493057116866112
Validation loss = 0.013051632791757584
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0343458391726017
Validation loss = 0.02063736878335476
Validation loss = 0.013844265602529049
Validation loss = 0.017117304727435112
Validation loss = 0.010404440574347973
Validation loss = 0.012865804135799408
Validation loss = 0.011866025626659393
Validation loss = 0.009767281822860241
Validation loss = 0.012156832031905651
Validation loss = 0.014054923318326473
Validation loss = 0.014320225454866886
Validation loss = 0.011409099213778973
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025337552651762962
Validation loss = 0.013923211954534054
Validation loss = 0.01191361341625452
Validation loss = 0.01932159624993801
Validation loss = 0.014553003944456577
Validation loss = 0.013957801274955273
Validation loss = 0.010736185126006603
Validation loss = 0.010301259346306324
Validation loss = 0.01393347978591919
Validation loss = 0.0170112457126379
Validation loss = 0.010044948197901249
Validation loss = 0.013388597406446934
Validation loss = 0.00989632960408926
Validation loss = 0.010656763799488544
Validation loss = 0.011841457337141037
Validation loss = 0.015910789370536804
Validation loss = 0.013371028937399387
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019228095188736916
Validation loss = 0.019487211480736732
Validation loss = 0.014775487594306469
Validation loss = 0.02046681009232998
Validation loss = 0.01469295471906662
Validation loss = 0.013189838267862797
Validation loss = 0.012801864184439182
Validation loss = 0.009937250055372715
Validation loss = 0.011902718804776669
Validation loss = 0.015595894306898117
Validation loss = 0.013547810725867748
Validation loss = 0.01632746495306492
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.77    |
| Iteration     | 7        |
| MaximumReturn | -0.0938  |
| MinimumReturn | -32.7    |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021811654791235924
Validation loss = 0.01064127404242754
Validation loss = 0.011413115076720715
Validation loss = 0.01218222826719284
Validation loss = 0.012685933150351048
Validation loss = 0.009487070143222809
Validation loss = 0.013578624464571476
Validation loss = 0.008109085261821747
Validation loss = 0.014652238227427006
Validation loss = 0.008793127723038197
Validation loss = 0.008754533715546131
Validation loss = 0.008807883597910404
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015305250883102417
Validation loss = 0.008824876509606838
Validation loss = 0.011159582994878292
Validation loss = 0.013607745058834553
Validation loss = 0.011942184530198574
Validation loss = 0.008752641268074512
Validation loss = 0.007366045378148556
Validation loss = 0.008445951156318188
Validation loss = 0.01535703707486391
Validation loss = 0.011547221802175045
Validation loss = 0.010888589546084404
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015200336463749409
Validation loss = 0.011768805794417858
Validation loss = 0.011096825823187828
Validation loss = 0.0144810751080513
Validation loss = 0.010943116620182991
Validation loss = 0.00846887193620205
Validation loss = 0.014397953636944294
Validation loss = 0.010782204568386078
Validation loss = 0.012346811592578888
Validation loss = 0.011316324584186077
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017815928906202316
Validation loss = 0.009093734435737133
Validation loss = 0.011051283217966557
Validation loss = 0.009149155579507351
Validation loss = 0.011473862454295158
Validation loss = 0.009644219651818275
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014438415877521038
Validation loss = 0.012195191346108913
Validation loss = 0.009507476352155209
Validation loss = 0.00810766126960516
Validation loss = 0.007959133945405483
Validation loss = 0.00889354757964611
Validation loss = 0.011031861416995525
Validation loss = 0.00911793764680624
Validation loss = 0.009217438288033009
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00428 |
| Iteration     | 8        |
| MaximumReturn | -0.00312 |
| MinimumReturn | -0.0058  |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014578361064195633
Validation loss = 0.016556765884160995
Validation loss = 0.01214651484042406
Validation loss = 0.008269918151199818
Validation loss = 0.010652037337422371
Validation loss = 0.00895532313734293
Validation loss = 0.008016684092581272
Validation loss = 0.0077498978935182095
Validation loss = 0.009213704615831375
Validation loss = 0.008146262727677822
Validation loss = 0.007466895971447229
Validation loss = 0.00883843470364809
Validation loss = 0.010199125856161118
Validation loss = 0.013175170868635178
Validation loss = 0.012551459483802319
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013625234365463257
Validation loss = 0.008319638669490814
Validation loss = 0.00964130274951458
Validation loss = 0.008844748139381409
Validation loss = 0.009392489679157734
Validation loss = 0.010762276127934456
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.018610235303640366
Validation loss = 0.018430905416607857
Validation loss = 0.015095717273652554
Validation loss = 0.008057717233896255
Validation loss = 0.007729801815003157
Validation loss = 0.008283342234790325
Validation loss = 0.010104489512741566
Validation loss = 0.008966010063886642
Validation loss = 0.00808302778750658
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01848435401916504
Validation loss = 0.008200295269489288
Validation loss = 0.009378587827086449
Validation loss = 0.009087014012038708
Validation loss = 0.009694581851363182
Validation loss = 0.0106166647747159
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013439134694635868
Validation loss = 0.0075950175523757935
Validation loss = 0.008390531875193119
Validation loss = 0.019943062216043472
Validation loss = 0.00781376101076603
Validation loss = 0.008295521140098572
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0133  |
| Iteration     | 9        |
| MaximumReturn | -0.0111  |
| MinimumReturn | -0.0168  |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01633048988878727
Validation loss = 0.012128316797316074
Validation loss = 0.008071686141192913
Validation loss = 0.007939755916595459
Validation loss = 0.007240592502057552
Validation loss = 0.009320076555013657
Validation loss = 0.01147813443094492
Validation loss = 0.01136939786374569
Validation loss = 0.006810314022004604
Validation loss = 0.008587359450757504
Validation loss = 0.013124777935445309
Validation loss = 0.010334077291190624
Validation loss = 0.008048111572861671
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01063602790236473
Validation loss = 0.008256364613771439
Validation loss = 0.00651843287050724
Validation loss = 0.006323121953755617
Validation loss = 0.00863322801887989
Validation loss = 0.012299559079110622
Validation loss = 0.0066386559046804905
Validation loss = 0.007328649517148733
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013189978897571564
Validation loss = 0.01613975316286087
Validation loss = 0.009942962788045406
Validation loss = 0.00674799270927906
Validation loss = 0.010213139466941357
Validation loss = 0.007842276245355606
Validation loss = 0.012370038777589798
Validation loss = 0.00868931319564581
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021741095930337906
Validation loss = 0.01781688630580902
Validation loss = 0.01231236383318901
Validation loss = 0.009569210931658745
Validation loss = 0.011167846620082855
Validation loss = 0.0096184853464365
Validation loss = 0.008373488672077656
Validation loss = 0.011081960052251816
Validation loss = 0.011815072037279606
Validation loss = 0.011914935894310474
Validation loss = 0.008713343180716038
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00971517339348793
Validation loss = 0.01087197009474039
Validation loss = 0.011204789392650127
Validation loss = 0.007404899224638939
Validation loss = 0.007736729923635721
Validation loss = 0.008772528730332851
Validation loss = 0.010934839956462383
Validation loss = 0.008256036788225174
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00783 |
| Iteration     | 10       |
| MaximumReturn | -0.00576 |
| MinimumReturn | -0.0102  |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01021366473287344
Validation loss = 0.008369488641619682
Validation loss = 0.006836987100541592
Validation loss = 0.009366120211780071
Validation loss = 0.009382138960063457
Validation loss = 0.0096603874117136
Validation loss = 0.007220095489174128
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019916679710149765
Validation loss = 0.011450465768575668
Validation loss = 0.006862561218440533
Validation loss = 0.0066917659714818
Validation loss = 0.006700931582599878
Validation loss = 0.007780680898576975
Validation loss = 0.008728930726647377
Validation loss = 0.00785670056939125
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010580155998468399
Validation loss = 0.007936103269457817
Validation loss = 0.00961608998477459
Validation loss = 0.007634010165929794
Validation loss = 0.007454771548509598
Validation loss = 0.006154075730592012
Validation loss = 0.014680464752018452
Validation loss = 0.009357904084026814
Validation loss = 0.008131826296448708
Validation loss = 0.008176470175385475
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009561611339449883
Validation loss = 0.006878930144011974
Validation loss = 0.00660106772556901
Validation loss = 0.007066930644214153
Validation loss = 0.012137682177126408
Validation loss = 0.02451484277844429
Validation loss = 0.009199988096952438
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020564107224345207
Validation loss = 0.009434742853045464
Validation loss = 0.007302301935851574
Validation loss = 0.008463171310722828
Validation loss = 0.008724674582481384
Validation loss = 0.011286774650216103
Validation loss = 0.00874384120106697
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00883 |
| Iteration     | 11       |
| MaximumReturn | -0.0069  |
| MinimumReturn | -0.0111  |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008768802508711815
Validation loss = 0.009883642196655273
Validation loss = 0.009151574224233627
Validation loss = 0.0070862239226698875
Validation loss = 0.006270915269851685
Validation loss = 0.006544467993080616
Validation loss = 0.00726355891674757
Validation loss = 0.007136321160942316
Validation loss = 0.007191764656454325
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013312074355781078
Validation loss = 0.011302534490823746
Validation loss = 0.011900301091372967
Validation loss = 0.008303973823785782
Validation loss = 0.008774332702159882
Validation loss = 0.006556467153131962
Validation loss = 0.0072763338685035706
Validation loss = 0.007321312092244625
Validation loss = 0.006591161247342825
Validation loss = 0.006694196257740259
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009354393929243088
Validation loss = 0.008877310901880264
Validation loss = 0.012113839387893677
Validation loss = 0.006534588988870382
Validation loss = 0.007304449565708637
Validation loss = 0.006553015671670437
Validation loss = 0.009074801579117775
Validation loss = 0.014237871393561363
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009191565215587616
Validation loss = 0.00925772450864315
Validation loss = 0.006916693411767483
Validation loss = 0.009346181526780128
Validation loss = 0.0164935402572155
Validation loss = 0.013085760176181793
Validation loss = 0.007760375738143921
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00983390398323536
Validation loss = 0.007447294890880585
Validation loss = 0.008537031710147858
Validation loss = 0.006409767083823681
Validation loss = 0.007542324718087912
Validation loss = 0.008645324036478996
Validation loss = 0.007550752256065607
Validation loss = 0.01028267852962017
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00759 |
| Iteration     | 12       |
| MaximumReturn | -0.00515 |
| MinimumReturn | -0.0112  |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008433553390204906
Validation loss = 0.0054400586523115635
Validation loss = 0.00468268571421504
Validation loss = 0.0077305566519498825
Validation loss = 0.0058458116836845875
Validation loss = 0.0055252378806471825
Validation loss = 0.007574218325316906
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008824543096125126
Validation loss = 0.0047963280230760574
Validation loss = 0.0044760070741176605
Validation loss = 0.004699448123574257
Validation loss = 0.004754160065203905
Validation loss = 0.005352577194571495
Validation loss = 0.00502748554572463
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009495451115071774
Validation loss = 0.005380109418183565
Validation loss = 0.005556113552302122
Validation loss = 0.005266841035336256
Validation loss = 0.005014405120164156
Validation loss = 0.006376451347023249
Validation loss = 0.005464227870106697
Validation loss = 0.0077877645380795
Validation loss = 0.0060793012380599976
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006831578444689512
Validation loss = 0.004684775602072477
Validation loss = 0.005658983718603849
Validation loss = 0.009336372837424278
Validation loss = 0.006329623982310295
Validation loss = 0.004818919114768505
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010304155759513378
Validation loss = 0.0047245691530406475
Validation loss = 0.004974892362952232
Validation loss = 0.004872738849371672
Validation loss = 0.005802868399769068
Validation loss = 0.007792049553245306
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0597  |
| Iteration     | 13       |
| MaximumReturn | -0.0159  |
| MinimumReturn | -0.711   |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006113952025771141
Validation loss = 0.004825390409678221
Validation loss = 0.004987519700080156
Validation loss = 0.005848817527294159
Validation loss = 0.004858589731156826
Validation loss = 0.006475614383816719
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007126264274120331
Validation loss = 0.006616448517888784
Validation loss = 0.005187766160815954
Validation loss = 0.005370975937694311
Validation loss = 0.005993325728923082
Validation loss = 0.0053767370991408825
Validation loss = 0.00626682722941041
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008363310247659683
Validation loss = 0.004361562430858612
Validation loss = 0.005527278408408165
Validation loss = 0.007272323127835989
Validation loss = 0.008357800543308258
Validation loss = 0.006595294456928968
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005994525272399187
Validation loss = 0.006713265553116798
Validation loss = 0.006680007558315992
Validation loss = 0.00684612849727273
Validation loss = 0.005397766828536987
Validation loss = 0.008502389304339886
Validation loss = 0.005495092365890741
Validation loss = 0.00468429084867239
Validation loss = 0.004881908651441336
Validation loss = 0.005517212674021721
Validation loss = 0.005815818905830383
Validation loss = 0.005234431475400925
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006152663845568895
Validation loss = 0.004782365169376135
Validation loss = 0.007104725111275911
Validation loss = 0.00500434311106801
Validation loss = 0.004669654183089733
Validation loss = 0.005356337409466505
Validation loss = 0.0047528669238090515
Validation loss = 0.005364178214222193
Validation loss = 0.0067930701188743114
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00311 |
| Iteration     | 14       |
| MaximumReturn | -0.00137 |
| MinimumReturn | -0.00766 |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006297128740698099
Validation loss = 0.005232572089880705
Validation loss = 0.004280662629753351
Validation loss = 0.00565564539283514
Validation loss = 0.006882291752845049
Validation loss = 0.005926050711423159
Validation loss = 0.005146886222064495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006044236943125725
Validation loss = 0.0044636535458266735
Validation loss = 0.004722528625279665
Validation loss = 0.005236460827291012
Validation loss = 0.004393899813294411
Validation loss = 0.004583665635436773
Validation loss = 0.005646723322570324
Validation loss = 0.004364364314824343
Validation loss = 0.004580554086714983
Validation loss = 0.008795572444796562
Validation loss = 0.005134906154125929
Validation loss = 0.010242180898785591
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006281489506363869
Validation loss = 0.004463891498744488
Validation loss = 0.006598126608878374
Validation loss = 0.005772852338850498
Validation loss = 0.004804420284926891
Validation loss = 0.004768973216414452
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006264332216233015
Validation loss = 0.0046430774964392185
Validation loss = 0.00413088221102953
Validation loss = 0.00422657048329711
Validation loss = 0.004833363462239504
Validation loss = 0.007046217564493418
Validation loss = 0.00679128710180521
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009436176158487797
Validation loss = 0.006813196465373039
Validation loss = 0.006356761325150728
Validation loss = 0.00863464642316103
Validation loss = 0.005002523306757212
Validation loss = 0.007550631184130907
Validation loss = 0.0046732560731470585
Validation loss = 0.007481291890144348
Validation loss = 0.005301568191498518
Validation loss = 0.010323915630578995
Validation loss = 0.005066719371825457
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -33.3    |
| Iteration     | 15       |
| MaximumReturn | -0.195   |
| MinimumReturn | -64.5    |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005518983583897352
Validation loss = 0.004465050995349884
Validation loss = 0.004868156276643276
Validation loss = 0.004665580578148365
Validation loss = 0.00429953308776021
Validation loss = 0.00816388987004757
Validation loss = 0.004576886538416147
Validation loss = 0.004427601583302021
Validation loss = 0.005011227913200855
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011838993988931179
Validation loss = 0.004898423794656992
Validation loss = 0.003971660975366831
Validation loss = 0.0036548797506839037
Validation loss = 0.0042558996938169
Validation loss = 0.0036737045738846064
Validation loss = 0.005576778203248978
Validation loss = 0.004928413778543472
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005874693393707275
Validation loss = 0.0050823502242565155
Validation loss = 0.0046574631705880165
Validation loss = 0.004339170642197132
Validation loss = 0.004313806537538767
Validation loss = 0.004002585541456938
Validation loss = 0.00675364350900054
Validation loss = 0.006105706095695496
Validation loss = 0.005250312387943268
Validation loss = 0.003639946924522519
Validation loss = 0.004036318510770798
Validation loss = 0.003672345308586955
Validation loss = 0.0049521601758897305
Validation loss = 0.005298779811710119
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009127778932452202
Validation loss = 0.009070745669305325
Validation loss = 0.004156991373747587
Validation loss = 0.004040510859340429
Validation loss = 0.0037601750809699297
Validation loss = 0.0037946158554404974
Validation loss = 0.004792585503309965
Validation loss = 0.0042232368141412735
Validation loss = 0.004609377589076757
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008151042275130749
Validation loss = 0.004176991526037455
Validation loss = 0.003927190322428942
Validation loss = 0.004824635107070208
Validation loss = 0.0042427354492247105
Validation loss = 0.004431166686117649
Validation loss = 0.005595716647803783
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00345 |
| Iteration     | 16       |
| MaximumReturn | -0.00221 |
| MinimumReturn | -0.0137  |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004743495956063271
Validation loss = 0.00477127218618989
Validation loss = 0.004088896792382002
Validation loss = 0.004121605306863785
Validation loss = 0.0036419439129531384
Validation loss = 0.00442449189722538
Validation loss = 0.004499932751059532
Validation loss = 0.005242346785962582
Validation loss = 0.005629160441458225
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005621713120490313
Validation loss = 0.0045723021030426025
Validation loss = 0.004500836133956909
Validation loss = 0.004544508643448353
Validation loss = 0.005077843554317951
Validation loss = 0.0040971627458930016
Validation loss = 0.003637993009760976
Validation loss = 0.00398667948320508
Validation loss = 0.003704445669427514
Validation loss = 0.00661909393966198
Validation loss = 0.004663571249693632
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004965447820723057
Validation loss = 0.00548227783292532
Validation loss = 0.006129996385425329
Validation loss = 0.0035297651775181293
Validation loss = 0.0034689742606133223
Validation loss = 0.00474278349429369
Validation loss = 0.004697447642683983
Validation loss = 0.0037459232844412327
Validation loss = 0.00528272520750761
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004869564902037382
Validation loss = 0.005161220207810402
Validation loss = 0.004201950971037149
Validation loss = 0.008104667998850346
Validation loss = 0.004241432063281536
Validation loss = 0.007038400042802095
Validation loss = 0.005952627398073673
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00555148022249341
Validation loss = 0.004420747049152851
Validation loss = 0.00443328358232975
Validation loss = 0.004575992003083229
Validation loss = 0.004263406619429588
Validation loss = 0.009895115159451962
Validation loss = 0.00541076622903347
Validation loss = 0.01208475697785616
Validation loss = 0.004649440757930279
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00219  |
| Iteration     | 17        |
| MaximumReturn | -0.000872 |
| MinimumReturn | -0.00389  |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008312301710247993
Validation loss = 0.00452893041074276
Validation loss = 0.003668068675324321
Validation loss = 0.0038104054983705282
Validation loss = 0.003488306188955903
Validation loss = 0.008308743126690388
Validation loss = 0.0034653320908546448
Validation loss = 0.00523342052474618
Validation loss = 0.003547034692019224
Validation loss = 0.003936208318918943
Validation loss = 0.004299820400774479
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008526305668056011
Validation loss = 0.003521587932482362
Validation loss = 0.003192316507920623
Validation loss = 0.003327148500829935
Validation loss = 0.0032795679289847612
Validation loss = 0.006655565463006496
Validation loss = 0.0040796841494739056
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005293181166052818
Validation loss = 0.005331798922270536
Validation loss = 0.0034921765327453613
Validation loss = 0.003488401649519801
Validation loss = 0.0068643903359770775
Validation loss = 0.003984728362411261
Validation loss = 0.00441367831081152
Validation loss = 0.005639277398586273
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005120099522173405
Validation loss = 0.004905196838080883
Validation loss = 0.0036815949715673923
Validation loss = 0.008625988848507404
Validation loss = 0.003328735241666436
Validation loss = 0.004250881262123585
Validation loss = 0.003244380932301283
Validation loss = 0.003279578872025013
Validation loss = 0.005143993999809027
Validation loss = 0.00604410283267498
Validation loss = 0.0035743857733905315
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005384114105254412
Validation loss = 0.0034006170462816954
Validation loss = 0.004572473466396332
Validation loss = 0.0034777517430484295
Validation loss = 0.003956242930144072
Validation loss = 0.0036518732085824013
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -113     |
| Iteration     | 18       |
| MaximumReturn | -0.336   |
| MinimumReturn | -148     |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01471378281712532
Validation loss = 0.00517582893371582
Validation loss = 0.002616118174046278
Validation loss = 0.0032431965228170156
Validation loss = 0.0034768101759254932
Validation loss = 0.005967719480395317
Validation loss = 0.003628495614975691
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010213751345872879
Validation loss = 0.0041197119280695915
Validation loss = 0.00248563545756042
Validation loss = 0.0026192842051386833
Validation loss = 0.002789044287055731
Validation loss = 0.005494327284395695
Validation loss = 0.002973723690956831
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007873422466218472
Validation loss = 0.003582920879125595
Validation loss = 0.0029198219999670982
Validation loss = 0.0031056341249495745
Validation loss = 0.0026733698323369026
Validation loss = 0.00294678402133286
Validation loss = 0.003945654258131981
Validation loss = 0.005292813293635845
Validation loss = 0.0027847099117934704
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007211257703602314
Validation loss = 0.0038976059295237064
Validation loss = 0.003081667935475707
Validation loss = 0.0030084149911999702
Validation loss = 0.0041187480092048645
Validation loss = 0.002916693454608321
Validation loss = 0.0027402834966778755
Validation loss = 0.0028529269620776176
Validation loss = 0.0027078003622591496
Validation loss = 0.003381461836397648
Validation loss = 0.0027959621511399746
Validation loss = 0.0034644193947315216
Validation loss = 0.0027065956965088844
Validation loss = 0.00390700064599514
Validation loss = 0.003443914232775569
Validation loss = 0.003659521695226431
Validation loss = 0.0026516991201788187
Validation loss = 0.0026645734906196594
Validation loss = 0.0022447369992733
Validation loss = 0.0030471892096102238
Validation loss = 0.0036671021953225136
Validation loss = 0.004126605577766895
Validation loss = 0.003981666173785925
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0071669030003249645
Validation loss = 0.003986518830060959
Validation loss = 0.004258451983332634
Validation loss = 0.003089974168688059
Validation loss = 0.0027778528165072203
Validation loss = 0.003057806985452771
Validation loss = 0.0068186563439667225
Validation loss = 0.0027411184273660183
Validation loss = 0.002722026314586401
Validation loss = 0.0024408691097050905
Validation loss = 0.005197067279368639
Validation loss = 0.0029246946796774864
Validation loss = 0.0028233909979462624
Validation loss = 0.0025941908825188875
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.202   |
| Iteration     | 19       |
| MaximumReturn | -0.154   |
| MinimumReturn | -0.284   |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00670880451798439
Validation loss = 0.002361780032515526
Validation loss = 0.0020874689798802137
Validation loss = 0.002345074201002717
Validation loss = 0.002010443713515997
Validation loss = 0.0025253486819565296
Validation loss = 0.002262705937027931
Validation loss = 0.0018109872471541166
Validation loss = 0.003740833140909672
Validation loss = 0.0022170599550008774
Validation loss = 0.003137598279863596
Validation loss = 0.0020000857766717672
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004893971607089043
Validation loss = 0.002222826238721609
Validation loss = 0.002285291673615575
Validation loss = 0.00220056832768023
Validation loss = 0.00211691134609282
Validation loss = 0.00282281800173223
Validation loss = 0.0033552327658981085
Validation loss = 0.0020007549319416285
Validation loss = 0.0036887105088680983
Validation loss = 0.002430745866149664
Validation loss = 0.002090150723233819
Validation loss = 0.0020749298855662346
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004023807123303413
Validation loss = 0.0024608043022453785
Validation loss = 0.0024359554518014193
Validation loss = 0.002285286784172058
Validation loss = 0.005072828382253647
Validation loss = 0.002350619062781334
Validation loss = 0.002709272550418973
Validation loss = 0.002698233351111412
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004624339286237955
Validation loss = 0.0027218719478696585
Validation loss = 0.003060316201299429
Validation loss = 0.0021006702445447445
Validation loss = 0.0021409194450825453
Validation loss = 0.002388219814747572
Validation loss = 0.0024045445024967194
Validation loss = 0.004703135695308447
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004935235716402531
Validation loss = 0.0026406152173876762
Validation loss = 0.0021258839406073093
Validation loss = 0.0022001592442393303
Validation loss = 0.002166721737012267
Validation loss = 0.0020120907574892044
Validation loss = 0.0018134716665372252
Validation loss = 0.002278074622154236
Validation loss = 0.002284572459757328
Validation loss = 0.0037551845889538527
Validation loss = 0.0035365172661840916
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0927  |
| Iteration     | 20       |
| MaximumReturn | -0.0559  |
| MinimumReturn | -0.145   |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00254400004632771
Validation loss = 0.00224210019223392
Validation loss = 0.002088436158373952
Validation loss = 0.0037787642795592546
Validation loss = 0.00328843598254025
Validation loss = 0.002000079257413745
Validation loss = 0.003082029055804014
Validation loss = 0.004377184901386499
Validation loss = 0.003997872117906809
Validation loss = 0.0025091059505939484
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002573139499872923
Validation loss = 0.0023004512768238783
Validation loss = 0.0020340289920568466
Validation loss = 0.003739899955689907
Validation loss = 0.004168866667896509
Validation loss = 0.003990191034972668
Validation loss = 0.0022632204927504063
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003141651628538966
Validation loss = 0.0026023208629339933
Validation loss = 0.0021156747825443745
Validation loss = 0.003387575037777424
Validation loss = 0.003298994619399309
Validation loss = 0.003207089379429817
Validation loss = 0.002048079390078783
Validation loss = 0.002571551129221916
Validation loss = 0.0021630243863910437
Validation loss = 0.0026581231504678726
Validation loss = 0.002943492028862238
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004306787624955177
Validation loss = 0.002185089746490121
Validation loss = 0.002825777279213071
Validation loss = 0.0029527749866247177
Validation loss = 0.004824775271117687
Validation loss = 0.002004274632781744
Validation loss = 0.002577306004241109
Validation loss = 0.002353677526116371
Validation loss = 0.003678028704598546
Validation loss = 0.002732028253376484
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0032996891532093287
Validation loss = 0.002428708365187049
Validation loss = 0.0025573137681931257
Validation loss = 0.0020285749342292547
Validation loss = 0.0029913457110524178
Validation loss = 0.002570537617430091
Validation loss = 0.002961090300232172
Validation loss = 0.0024335277266800404
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00303 |
| Iteration     | 21       |
| MaximumReturn | -0.00108 |
| MinimumReturn | -0.00607 |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0037337294779717922
Validation loss = 0.002149928594008088
Validation loss = 0.0022678456734865904
Validation loss = 0.0020825646352022886
Validation loss = 0.0018522146856412292
Validation loss = 0.0033323608804494143
Validation loss = 0.0021925687324255705
Validation loss = 0.002984893275424838
Validation loss = 0.003429419593885541
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0023595860693603754
Validation loss = 0.0024135650601238012
Validation loss = 0.002813291270285845
Validation loss = 0.0019921467173844576
Validation loss = 0.003090194659307599
Validation loss = 0.0026778632309287786
Validation loss = 0.0029482298996299505
Validation loss = 0.0022151623852550983
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0030863892752677202
Validation loss = 0.0028263733256608248
Validation loss = 0.0020310059189796448
Validation loss = 0.0030363742262125015
Validation loss = 0.002035668818280101
Validation loss = 0.0029546222649514675
Validation loss = 0.002171493135392666
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0024970886297523975
Validation loss = 0.002270444994792342
Validation loss = 0.002217695815488696
Validation loss = 0.002623135456815362
Validation loss = 0.005920777563005686
Validation loss = 0.002106772968545556
Validation loss = 0.002679696073755622
Validation loss = 0.002031502313911915
Validation loss = 0.002557002939283848
Validation loss = 0.001886085607111454
Validation loss = 0.003294658847153187
Validation loss = 0.0022495202720165253
Validation loss = 0.0020018501672893763
Validation loss = 0.002369069494307041
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0026917208451777697
Validation loss = 0.003343049203976989
Validation loss = 0.0020360194612294436
Validation loss = 0.0037740846164524555
Validation loss = 0.0025258096866309643
Validation loss = 0.0025861021131277084
Validation loss = 0.003934046719223261
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0071  |
| Iteration     | 22       |
| MaximumReturn | -0.00188 |
| MinimumReturn | -0.0162  |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002839463297277689
Validation loss = 0.002658229088410735
Validation loss = 0.002637319266796112
Validation loss = 0.0020944776479154825
Validation loss = 0.0018089599907398224
Validation loss = 0.003864298341795802
Validation loss = 0.002642909297719598
Validation loss = 0.0027502435259521008
Validation loss = 0.003658891888335347
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003034221474081278
Validation loss = 0.0028406158089637756
Validation loss = 0.003840250428766012
Validation loss = 0.0018104331102222204
Validation loss = 0.002183951437473297
Validation loss = 0.0033837954979389906
Validation loss = 0.002725837752223015
Validation loss = 0.003303334815427661
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005723945796489716
Validation loss = 0.0029168177861720324
Validation loss = 0.0020717575680464506
Validation loss = 0.002075537107884884
Validation loss = 0.0018189152469858527
Validation loss = 0.0020967121236026287
Validation loss = 0.0020834081806242466
Validation loss = 0.002308318857103586
Validation loss = 0.002396996598690748
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002076847245916724
Validation loss = 0.002943492727354169
Validation loss = 0.0021775993518531322
Validation loss = 0.0029908923897892237
Validation loss = 0.002221079543232918
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0033057592809200287
Validation loss = 0.001949590863659978
Validation loss = 0.0021961787715554237
Validation loss = 0.0025915789883583784
Validation loss = 0.003916529938578606
Validation loss = 0.0022274949587881565
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0644  |
| Iteration     | 23       |
| MaximumReturn | -0.0388  |
| MinimumReturn | -0.0975  |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021054167300462723
Validation loss = 0.0019326753681525588
Validation loss = 0.0023810816928744316
Validation loss = 0.0021326025016605854
Validation loss = 0.0034977581817656755
Validation loss = 0.0019730073399841785
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002688413253054023
Validation loss = 0.0019805198535323143
Validation loss = 0.0018462013686075807
Validation loss = 0.0035097214858978987
Validation loss = 0.0024401487316936255
Validation loss = 0.0025544397067278624
Validation loss = 0.0025501265190541744
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022237079683691263
Validation loss = 0.0020606834441423416
Validation loss = 0.002223299816250801
Validation loss = 0.002654233481734991
Validation loss = 0.0018193507567048073
Validation loss = 0.002816963940858841
Validation loss = 0.0032172291539609432
Validation loss = 0.002059576101601124
Validation loss = 0.004041373264044523
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002052202122285962
Validation loss = 0.0027445207815617323
Validation loss = 0.002784593263641
Validation loss = 0.0019973176531493664
Validation loss = 0.002953342627733946
Validation loss = 0.002770235762000084
Validation loss = 0.0019363269675523043
Validation loss = 0.0041818139143288136
Validation loss = 0.0021207265090197325
Validation loss = 0.0018551949178799987
Validation loss = 0.002663897816091776
Validation loss = 0.0025580436922609806
Validation loss = 0.0019359278958290815
Validation loss = 0.0019957846961915493
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002869158983230591
Validation loss = 0.0019239436369389296
Validation loss = 0.00203817174769938
Validation loss = 0.002156684873625636
Validation loss = 0.0021218261681497097
Validation loss = 0.002697128104045987
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.58    |
| Iteration     | 24       |
| MaximumReturn | -0.00102 |
| MinimumReturn | -29.3    |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002128876280039549
Validation loss = 0.002286583883687854
Validation loss = 0.002480831230059266
Validation loss = 0.002534769242629409
Validation loss = 0.002242710441350937
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027567516081035137
Validation loss = 0.00329882581718266
Validation loss = 0.001787305111065507
Validation loss = 0.0022030372638255358
Validation loss = 0.00202134158462286
Validation loss = 0.002009683521464467
Validation loss = 0.0017148872138932347
Validation loss = 0.0023524295538663864
Validation loss = 0.0020976033993065357
Validation loss = 0.00195935252122581
Validation loss = 0.001984027214348316
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002674625487998128
Validation loss = 0.002557822735980153
Validation loss = 0.0034641094971448183
Validation loss = 0.0019944054074585438
Validation loss = 0.0025549884885549545
Validation loss = 0.002920003840699792
Validation loss = 0.0018553758272901177
Validation loss = 0.0018273533787578344
Validation loss = 0.002154872752726078
Validation loss = 0.0032722169999033213
Validation loss = 0.002313567092642188
Validation loss = 0.0018104476621374488
Validation loss = 0.002795507200062275
Validation loss = 0.0022390959784388542
Validation loss = 0.0018897062400355935
Validation loss = 0.0022067793179303408
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0020659645088016987
Validation loss = 0.0019176144851371646
Validation loss = 0.00417248485609889
Validation loss = 0.001952307764440775
Validation loss = 0.003838652977719903
Validation loss = 0.002335529774427414
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025665387511253357
Validation loss = 0.003593999659642577
Validation loss = 0.002793992403894663
Validation loss = 0.0048127020709216595
Validation loss = 0.00218397518619895
Validation loss = 0.002747621852904558
Validation loss = 0.0018626542296260595
Validation loss = 0.0017408395651727915
Validation loss = 0.002565009519457817
Validation loss = 0.002844081027433276
Validation loss = 0.0019464808283373713
Validation loss = 0.0031339898705482483
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.52    |
| Iteration     | 25       |
| MaximumReturn | -0.0253  |
| MinimumReturn | -35.8    |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002044488675892353
Validation loss = 0.0023337630555033684
Validation loss = 0.0033226930536329746
Validation loss = 0.0019102167570963502
Validation loss = 0.002170713385567069
Validation loss = 0.0019387472420930862
Validation loss = 0.0018218179466202855
Validation loss = 0.00304172420874238
Validation loss = 0.0016300417482852936
Validation loss = 0.001999968895688653
Validation loss = 0.002093787305057049
Validation loss = 0.0034143305383622646
Validation loss = 0.0022219857200980186
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002092592418193817
Validation loss = 0.0023314019199460745
Validation loss = 0.0024996965657919645
Validation loss = 0.0021641890052706003
Validation loss = 0.002675471128895879
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002783288713544607
Validation loss = 0.002025294117629528
Validation loss = 0.003078359179198742
Validation loss = 0.002422790043056011
Validation loss = 0.002737007336691022
Validation loss = 0.0019068728433921933
Validation loss = 0.002431352622807026
Validation loss = 0.0021479595452547073
Validation loss = 0.0017371934372931719
Validation loss = 0.003131900215521455
Validation loss = 0.0025168554857373238
Validation loss = 0.0021130398381501436
Validation loss = 0.0020125648006796837
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002139740390703082
Validation loss = 0.0018437034450471401
Validation loss = 0.0020044685807079077
Validation loss = 0.0028027272783219814
Validation loss = 0.0022930868435651064
Validation loss = 0.002434918424114585
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0037048461381345987
Validation loss = 0.002689007669687271
Validation loss = 0.0018497463315725327
Validation loss = 0.0021464729215949774
Validation loss = 0.0038980317767709494
Validation loss = 0.0026123400311917067
Validation loss = 0.0020827404223382473
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.2    |
| Iteration     | 26       |
| MaximumReturn | -0.0445  |
| MinimumReturn | -39.7    |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00280909170396626
Validation loss = 0.0021328229922801256
Validation loss = 0.002446750644594431
Validation loss = 0.001952465157955885
Validation loss = 0.0024927372578531504
Validation loss = 0.001584448735229671
Validation loss = 0.0017322797793895006
Validation loss = 0.002326264511793852
Validation loss = 0.0029094486963003874
Validation loss = 0.0016429454553872347
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002349052345380187
Validation loss = 0.0020321228075772524
Validation loss = 0.0022918637841939926
Validation loss = 0.001980050466954708
Validation loss = 0.001497397548519075
Validation loss = 0.0023482562974095345
Validation loss = 0.00200863741338253
Validation loss = 0.0017579210689291358
Validation loss = 0.0017423272365704179
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002405608305707574
Validation loss = 0.0024086462799459696
Validation loss = 0.0018253433518111706
Validation loss = 0.0018324095290154219
Validation loss = 0.0015468925703316927
Validation loss = 0.0016320409486070275
Validation loss = 0.0015642151702195406
Validation loss = 0.0016385286580771208
Validation loss = 0.0021689534187316895
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018930956721305847
Validation loss = 0.001907498575747013
Validation loss = 0.0020581348799169064
Validation loss = 0.0018668774282559752
Validation loss = 0.0022519270423799753
Validation loss = 0.0026135097723454237
Validation loss = 0.0031288503669202328
Validation loss = 0.0019340316066518426
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00218376237899065
Validation loss = 0.0017065282445400953
Validation loss = 0.002219270681962371
Validation loss = 0.0034738355316221714
Validation loss = 0.00207236735150218
Validation loss = 0.0032799963373690844
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00796 |
| Iteration     | 27       |
| MaximumReturn | -0.00202 |
| MinimumReturn | -0.0463  |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0027351723983883858
Validation loss = 0.001924364478327334
Validation loss = 0.0018091275123879313
Validation loss = 0.0019163815304636955
Validation loss = 0.002848297357559204
Validation loss = 0.0016500888159498572
Validation loss = 0.0027688529808074236
Validation loss = 0.0020068835001438856
Validation loss = 0.0021954241674393415
Validation loss = 0.0023704133927822113
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001887118094600737
Validation loss = 0.002909693866968155
Validation loss = 0.0021229274570941925
Validation loss = 0.0021041594445705414
Validation loss = 0.0027176167350262403
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0034974217414855957
Validation loss = 0.00215904857032001
Validation loss = 0.0018668895354494452
Validation loss = 0.0020646555349230766
Validation loss = 0.0020041242241859436
Validation loss = 0.0023014219477772713
Validation loss = 0.002269314369186759
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00206959480419755
Validation loss = 0.0016946714604273438
Validation loss = 0.0025599529035389423
Validation loss = 0.001900975126773119
Validation loss = 0.003595018992200494
Validation loss = 0.001979032764211297
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003031795145943761
Validation loss = 0.0031227211002260447
Validation loss = 0.002049973700195551
Validation loss = 0.0021160284522920847
Validation loss = 0.001970704412087798
Validation loss = 0.002301968866959214
Validation loss = 0.001946225413121283
Validation loss = 0.0022624966222792864
Validation loss = 0.0020433159079402685
Validation loss = 0.003415863960981369
Validation loss = 0.0019186480203643441
Validation loss = 0.0016283796867355704
Validation loss = 0.0018572102999314666
Validation loss = 0.0019424054771661758
Validation loss = 0.002037070458754897
Validation loss = 0.0017177079571411014
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.99    |
| Iteration     | 28       |
| MaximumReturn | -0.0029  |
| MinimumReturn | -31.8    |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021611242555081844
Validation loss = 0.00204095127992332
Validation loss = 0.0018373504281044006
Validation loss = 0.001629986334592104
Validation loss = 0.0019670354668051004
Validation loss = 0.0021123175974935293
Validation loss = 0.002323817927390337
Validation loss = 0.002451011212542653
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002391193527728319
Validation loss = 0.001881814911030233
Validation loss = 0.003991906531155109
Validation loss = 0.002047385321930051
Validation loss = 0.0017516202060505748
Validation loss = 0.002444735961034894
Validation loss = 0.002072058618068695
Validation loss = 0.0028294441290199757
Validation loss = 0.0017778020119294524
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003243428422138095
Validation loss = 0.001696492894552648
Validation loss = 0.0020442295353859663
Validation loss = 0.0016444873763248324
Validation loss = 0.0026278665754944086
Validation loss = 0.00222022901289165
Validation loss = 0.002934904070571065
Validation loss = 0.0015696152113378048
Validation loss = 0.0018329254817217588
Validation loss = 0.0018302681855857372
Validation loss = 0.002161693759262562
Validation loss = 0.0016724104061722755
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002546637086197734
Validation loss = 0.0016856108559295535
Validation loss = 0.003709265496581793
Validation loss = 0.001901645096950233
Validation loss = 0.001594087458215654
Validation loss = 0.0016687656752765179
Validation loss = 0.0017079355893656611
Validation loss = 0.0016507296822965145
Validation loss = 0.002644154941663146
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018382015405222774
Validation loss = 0.0016746349865570664
Validation loss = 0.0015960735036060214
Validation loss = 0.002810236532241106
Validation loss = 0.0019268961623311043
Validation loss = 0.0021028730552643538
Validation loss = 0.001853413414210081
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -9.3      |
| Iteration     | 29        |
| MaximumReturn | -0.000597 |
| MinimumReturn | -63.7     |
| TotalSamples  | 51646     |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018952512182295322
Validation loss = 0.0028104567900300026
Validation loss = 0.0023562703281641006
Validation loss = 0.0025618940126150846
Validation loss = 0.002096240408718586
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0028091922868043184
Validation loss = 0.0017767064273357391
Validation loss = 0.002387009095400572
Validation loss = 0.001495109056122601
Validation loss = 0.002188920509070158
Validation loss = 0.0021795344073325396
Validation loss = 0.0018110042437911034
Validation loss = 0.001697942498140037
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015064841136336327
Validation loss = 0.0022495209705084562
Validation loss = 0.0023557625245302916
Validation loss = 0.001699224580079317
Validation loss = 0.002247077180072665
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0030233636498451233
Validation loss = 0.0022728913463652134
Validation loss = 0.0016911141574382782
Validation loss = 0.003027119440957904
Validation loss = 0.0016393312253057957
Validation loss = 0.003634589957073331
Validation loss = 0.0019170093582943082
Validation loss = 0.0018886312609538436
Validation loss = 0.0019820991437882185
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002241438254714012
Validation loss = 0.0024481972213834524
Validation loss = 0.0038968974258750677
Validation loss = 0.0020068283192813396
Validation loss = 0.002344919368624687
Validation loss = 0.0021209330298006535
Validation loss = 0.0019627187866717577
Validation loss = 0.0025105879176408052
Validation loss = 0.0018905163742601871
Validation loss = 0.0017498884117230773
Validation loss = 0.002559404354542494
Validation loss = 0.0016031763516366482
Validation loss = 0.001903385273180902
Validation loss = 0.0037588293198496103
Validation loss = 0.0017925691790878773
Validation loss = 0.002751915715634823
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.3    |
| Iteration     | 30       |
| MaximumReturn | -0.00169 |
| MinimumReturn | -47.1    |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025319266133010387
Validation loss = 0.005450753029435873
Validation loss = 0.0019762825686484575
Validation loss = 0.0017493845662102103
Validation loss = 0.0017486780416220427
Validation loss = 0.0016020003240555525
Validation loss = 0.0019648948218673468
Validation loss = 0.00204276479780674
Validation loss = 0.0016773357056081295
Validation loss = 0.0030827708542346954
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002084197709336877
Validation loss = 0.0034681418910622597
Validation loss = 0.0019247231539338827
Validation loss = 0.0018470778595656157
Validation loss = 0.005420533940196037
Validation loss = 0.002788225421682
Validation loss = 0.002122733276337385
Validation loss = 0.0016729006310924888
Validation loss = 0.0016539458883926272
Validation loss = 0.0023497152142226696
Validation loss = 0.0014972543576732278
Validation loss = 0.0019896947778761387
Validation loss = 0.0026908540166914463
Validation loss = 0.0021468678023666143
Validation loss = 0.002300723921507597
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0032143774442374706
Validation loss = 0.002141000470146537
Validation loss = 0.0017570947529748082
Validation loss = 0.0019707223400473595
Validation loss = 0.0018012081272900105
Validation loss = 0.003530179848894477
Validation loss = 0.0017045089043676853
Validation loss = 0.0021925971377640963
Validation loss = 0.001889147562906146
Validation loss = 0.0022168292198330164
Validation loss = 0.001968703232705593
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0019541701767593622
Validation loss = 0.0018810904584825039
Validation loss = 0.0020030266605317593
Validation loss = 0.0024760179221630096
Validation loss = 0.002915844088420272
Validation loss = 0.002020051470026374
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018576571019366384
Validation loss = 0.002474906388670206
Validation loss = 0.0019474332220852375
Validation loss = 0.001995990751311183
Validation loss = 0.0017865244299173355
Validation loss = 0.001785102766007185
Validation loss = 0.0014770656125620008
Validation loss = 0.0015495746629312634
Validation loss = 0.001975759631022811
Validation loss = 0.001638951594941318
Validation loss = 0.001417535706423223
Validation loss = 0.0019486176315695047
Validation loss = 0.0017230217345058918
Validation loss = 0.0019021148327738047
Validation loss = 0.0018744624685496092
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0822  |
| Iteration     | 31       |
| MaximumReturn | -0.0487  |
| MinimumReturn | -0.296   |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0032732486724853516
Validation loss = 0.0018061487935483456
Validation loss = 0.002111174166202545
Validation loss = 0.0021935354452580214
Validation loss = 0.001470728311687708
Validation loss = 0.002151843626052141
Validation loss = 0.0018151619005948305
Validation loss = 0.002203837502747774
Validation loss = 0.0015942752361297607
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003633430227637291
Validation loss = 0.0022480804473161697
Validation loss = 0.0017014717450365424
Validation loss = 0.005245780106633902
Validation loss = 0.002266942523419857
Validation loss = 0.0029988419264554977
Validation loss = 0.0015335993375629187
Validation loss = 0.001995550701394677
Validation loss = 0.0016857541631907225
Validation loss = 0.001610620878636837
Validation loss = 0.002153711626306176
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022552262526005507
Validation loss = 0.0018713617464527488
Validation loss = 0.0023281429894268513
Validation loss = 0.001588682527653873
Validation loss = 0.0018241883954033256
Validation loss = 0.0019185298588126898
Validation loss = 0.0014178953133523464
Validation loss = 0.0022264334838837385
Validation loss = 0.0018829685868695378
Validation loss = 0.002336512552574277
Validation loss = 0.0015428471378982067
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002428976586088538
Validation loss = 0.0017353157745674253
Validation loss = 0.0017179747810587287
Validation loss = 0.002324588829651475
Validation loss = 0.0020206947810947895
Validation loss = 0.0017776323948055506
Validation loss = 0.0025849153753370047
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020347055979073048
Validation loss = 0.0019011696567758918
Validation loss = 0.002389057306572795
Validation loss = 0.0016228955937549472
Validation loss = 0.002509594429284334
Validation loss = 0.001671336474828422
Validation loss = 0.0024832694325596094
Validation loss = 0.0016982833622023463
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0143  |
| Iteration     | 32       |
| MaximumReturn | -0.00271 |
| MinimumReturn | -0.0269  |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002345283282920718
Validation loss = 0.0028368632774800062
Validation loss = 0.00153203250374645
Validation loss = 0.0022776599507778883
Validation loss = 0.0019535666797310114
Validation loss = 0.0026393162552267313
Validation loss = 0.0021380602847784758
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027548952493816614
Validation loss = 0.0025350081268697977
Validation loss = 0.0018386660376563668
Validation loss = 0.001821337384171784
Validation loss = 0.0028993692249059677
Validation loss = 0.0018964260816574097
Validation loss = 0.001742828288115561
Validation loss = 0.001873145462013781
Validation loss = 0.0017310617258772254
Validation loss = 0.003303298493847251
Validation loss = 0.0016936735482886434
Validation loss = 0.0019858633168041706
Validation loss = 0.0016742524458095431
Validation loss = 0.002104866551235318
Validation loss = 0.00205412320792675
Validation loss = 0.002343274187296629
Validation loss = 0.0017489574383944273
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001961736474186182
Validation loss = 0.002398629439994693
Validation loss = 0.0020112316124141216
Validation loss = 0.0013823732733726501
Validation loss = 0.0026447237469255924
Validation loss = 0.001501634018495679
Validation loss = 0.002321536187082529
Validation loss = 0.002249103272333741
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002339368686079979
Validation loss = 0.003317207098007202
Validation loss = 0.002057792851701379
Validation loss = 0.0031670776661485434
Validation loss = 0.0018228850094601512
Validation loss = 0.0016697758110240102
Validation loss = 0.0017885462148115039
Validation loss = 0.0019051966955885291
Validation loss = 0.002041906351223588
Validation loss = 0.0017700649332255125
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002142857061699033
Validation loss = 0.002234721789136529
Validation loss = 0.0019530373392626643
Validation loss = 0.0018337768269702792
Validation loss = 0.002481193980202079
Validation loss = 0.002060667145997286
Validation loss = 0.0020965312141925097
Validation loss = 0.002096796641126275
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.47    |
| Iteration     | 33       |
| MaximumReturn | -0.0242  |
| MinimumReturn | -22.9    |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018286185804754496
Validation loss = 0.003464424517005682
Validation loss = 0.0022927431855350733
Validation loss = 0.0015556568978354335
Validation loss = 0.00143899442628026
Validation loss = 0.0019929108675569296
Validation loss = 0.002180565847083926
Validation loss = 0.0018045090837404132
Validation loss = 0.0022258460521698
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001949112513102591
Validation loss = 0.002016034908592701
Validation loss = 0.0037054657004773617
Validation loss = 0.001617209636606276
Validation loss = 0.0014109235489740968
Validation loss = 0.0014246548525989056
Validation loss = 0.0017965591978281736
Validation loss = 0.0021716305054724216
Validation loss = 0.001546791521832347
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00190817064139992
Validation loss = 0.0015083237085491419
Validation loss = 0.0026079919189214706
Validation loss = 0.001687513431534171
Validation loss = 0.0017750209663063288
Validation loss = 0.0017287323717027903
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001536414842121303
Validation loss = 0.0023076971992850304
Validation loss = 0.002121311379596591
Validation loss = 0.002141157630831003
Validation loss = 0.001844706479460001
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0037555478047579527
Validation loss = 0.001788777532055974
Validation loss = 0.0019776804838329554
Validation loss = 0.0020424837712198496
Validation loss = 0.0019573962781578302
Validation loss = 0.001804832136258483
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.216   |
| Iteration     | 34       |
| MaximumReturn | -0.00248 |
| MinimumReturn | -4.68    |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015625644009560347
Validation loss = 0.0023408320266753435
Validation loss = 0.0016971933655440807
Validation loss = 0.002055469434708357
Validation loss = 0.002316940575838089
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002344719832763076
Validation loss = 0.0017962467391043901
Validation loss = 0.002205695491284132
Validation loss = 0.002185118617489934
Validation loss = 0.0016507763648405671
Validation loss = 0.003903027391061187
Validation loss = 0.001970096491277218
Validation loss = 0.001409753691405058
Validation loss = 0.001727968454360962
Validation loss = 0.0024107438512146473
Validation loss = 0.0018590783001855016
Validation loss = 0.0016773156821727753
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0020583549048751593
Validation loss = 0.0017797023756429553
Validation loss = 0.0025057538878172636
Validation loss = 0.0016781451413407922
Validation loss = 0.0026179146952927113
Validation loss = 0.001788008725270629
Validation loss = 0.002324822125956416
Validation loss = 0.0016968221170827746
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002324026543647051
Validation loss = 0.001951688900589943
Validation loss = 0.001746505149640143
Validation loss = 0.0019096947507932782
Validation loss = 0.002141435630619526
Validation loss = 0.0020019093062728643
Validation loss = 0.0017328246030956507
Validation loss = 0.0015107119688764215
Validation loss = 0.0013697057729586959
Validation loss = 0.0021739129442721605
Validation loss = 0.0017318265745416284
Validation loss = 0.001908050267957151
Validation loss = 0.0019482399802654982
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016400146996602416
Validation loss = 0.001698017935268581
Validation loss = 0.0016675355145707726
Validation loss = 0.0015396274393424392
Validation loss = 0.0022385732736438513
Validation loss = 0.0018921908922493458
Validation loss = 0.0019265373703092337
Validation loss = 0.00164924340788275
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.356   |
| Iteration     | 35       |
| MaximumReturn | -0.0213  |
| MinimumReturn | -5.99    |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00180765928234905
Validation loss = 0.0014916142681613564
Validation loss = 0.0020408581476658583
Validation loss = 0.001520831952802837
Validation loss = 0.0015739492373540998
Validation loss = 0.0035157587844878435
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002648812485858798
Validation loss = 0.0015644816448912024
Validation loss = 0.0019604333210736513
Validation loss = 0.0021600902546197176
Validation loss = 0.0024675338063389063
Validation loss = 0.0013429926475510001
Validation loss = 0.0018145530484616756
Validation loss = 0.002539970213547349
Validation loss = 0.002010525669902563
Validation loss = 0.001679384964518249
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0021407795138657093
Validation loss = 0.0017374753952026367
Validation loss = 0.00163926905952394
Validation loss = 0.0018684574170038104
Validation loss = 0.0024912802036851645
Validation loss = 0.0015633213333785534
Validation loss = 0.001956347143277526
Validation loss = 0.0016603667754679918
Validation loss = 0.0028334895614534616
Validation loss = 0.0026552623603492975
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0027927651535719633
Validation loss = 0.0019020714098587632
Validation loss = 0.001708870637230575
Validation loss = 0.0014602165902033448
Validation loss = 0.0018409310141578317
Validation loss = 0.002080003498122096
Validation loss = 0.0016128935385495424
Validation loss = 0.0022529957350343466
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020579127594828606
Validation loss = 0.0018204465741291642
Validation loss = 0.00578476395457983
Validation loss = 0.0014932474587112665
Validation loss = 0.0014442760730162263
Validation loss = 0.0017674495466053486
Validation loss = 0.002233408624306321
Validation loss = 0.00262195966206491
Validation loss = 0.0017694089328870177
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.05    |
| Iteration     | 36       |
| MaximumReturn | -0.0197  |
| MinimumReturn | -9.64    |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020876259077340364
Validation loss = 0.0014365633251145482
Validation loss = 0.001578548806719482
Validation loss = 0.0018757920479401946
Validation loss = 0.0015072330133989453
Validation loss = 0.001572351437062025
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017362623475492
Validation loss = 0.0015833540819585323
Validation loss = 0.0015789710450917482
Validation loss = 0.0013741158181801438
Validation loss = 0.0013658046955242753
Validation loss = 0.00280460505746305
Validation loss = 0.0017089943867176771
Validation loss = 0.0021230701822787523
Validation loss = 0.0017183792078867555
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018437848193570971
Validation loss = 0.0016641211695969105
Validation loss = 0.001530592911876738
Validation loss = 0.0015709305880591273
Validation loss = 0.0018321634270250797
Validation loss = 0.0020604198798537254
Validation loss = 0.0018351045437157154
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013803328620269895
Validation loss = 0.0025254085194319487
Validation loss = 0.001672054291702807
Validation loss = 0.0017316892044618726
Validation loss = 0.0022002938203513622
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016301681753247976
Validation loss = 0.0020919342059642076
Validation loss = 0.0016044402727857232
Validation loss = 0.0016319279093295336
Validation loss = 0.0025206897407770157
Validation loss = 0.0021988300140947104
Validation loss = 0.0015522107714787126
Validation loss = 0.0025699830148369074
Validation loss = 0.002156692324206233
Validation loss = 0.002324705244973302
Validation loss = 0.0017358202021569014
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0123  |
| Iteration     | 37       |
| MaximumReturn | -0.00113 |
| MinimumReturn | -0.0278  |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002076723612844944
Validation loss = 0.0015680729411542416
Validation loss = 0.0023528998717665672
Validation loss = 0.0020841651130467653
Validation loss = 0.001500111073255539
Validation loss = 0.002363258507102728
Validation loss = 0.0016211152542382479
Validation loss = 0.001641182927414775
Validation loss = 0.003392944112420082
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017867723945528269
Validation loss = 0.0018046197947114706
Validation loss = 0.00149628147482872
Validation loss = 0.001853215740993619
Validation loss = 0.001601031981408596
Validation loss = 0.0015496634878218174
Validation loss = 0.0019976033363491297
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001528138411231339
Validation loss = 0.0018741940148174763
Validation loss = 0.0021022632718086243
Validation loss = 0.0015436322428286076
Validation loss = 0.001614773878827691
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002241344191133976
Validation loss = 0.0016943010268732905
Validation loss = 0.0016376854619011283
Validation loss = 0.0019156746566295624
Validation loss = 0.0017415019683539867
Validation loss = 0.0018359082750976086
Validation loss = 0.0033931476064026356
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0022291368804872036
Validation loss = 0.0017942243721336126
Validation loss = 0.0014410794246941805
Validation loss = 0.0016603826079517603
Validation loss = 0.0015786077128723264
Validation loss = 0.001984166447073221
Validation loss = 0.0016475837910547853
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.9    |
| Iteration     | 38       |
| MaximumReturn | -0.00392 |
| MinimumReturn | -51.1    |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017619027057662606
Validation loss = 0.0015292727621272206
Validation loss = 0.001654467429034412
Validation loss = 0.0016052937135100365
Validation loss = 0.0026578600518405437
Validation loss = 0.001563186990097165
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0032189965713769197
Validation loss = 0.002421343931928277
Validation loss = 0.001488559995777905
Validation loss = 0.001417952822521329
Validation loss = 0.0014691666001453996
Validation loss = 0.002138183219358325
Validation loss = 0.0016326876357197762
Validation loss = 0.0015719542279839516
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001548315049149096
Validation loss = 0.0021444952581077814
Validation loss = 0.0018552918918430805
Validation loss = 0.001550409127958119
Validation loss = 0.0014799265190958977
Validation loss = 0.0013841091422364116
Validation loss = 0.0014842033851891756
Validation loss = 0.0015278039500117302
Validation loss = 0.0018547915387898684
Validation loss = 0.0023280575405806303
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015799679094925523
Validation loss = 0.0014720306498929858
Validation loss = 0.0014133621007204056
Validation loss = 0.0016239541582763195
Validation loss = 0.0023647076450288296
Validation loss = 0.002908615628257394
Validation loss = 0.001669804914854467
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019180061062797904
Validation loss = 0.001786110340617597
Validation loss = 0.001494457945227623
Validation loss = 0.0015325797721743584
Validation loss = 0.0020569413900375366
Validation loss = 0.0019100015051662922
Validation loss = 0.0013811738463118672
Validation loss = 0.0018190654227510095
Validation loss = 0.0013707479229196906
Validation loss = 0.001879677758552134
Validation loss = 0.0014893356710672379
Validation loss = 0.0017228726064786315
Validation loss = 0.002159268129616976
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00594  |
| Iteration     | 39        |
| MaximumReturn | -0.000695 |
| MinimumReturn | -0.0536   |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001979236723855138
Validation loss = 0.0016716812970116735
Validation loss = 0.002010627184063196
Validation loss = 0.0015819007530808449
Validation loss = 0.0023789708502590656
Validation loss = 0.0017737351590767503
Validation loss = 0.0018680102657526731
Validation loss = 0.0015562064945697784
Validation loss = 0.002212438266724348
Validation loss = 0.0019677719101309776
Validation loss = 0.002224822761490941
Validation loss = 0.0022849785163998604
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002045324305072427
Validation loss = 0.0015393589856103063
Validation loss = 0.0014140697894617915
Validation loss = 0.005758728366345167
Validation loss = 0.0015718695940449834
Validation loss = 0.0013017980381846428
Validation loss = 0.0014783801743760705
Validation loss = 0.0017550480552017689
Validation loss = 0.0013634583447128534
Validation loss = 0.0014508924214169383
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002033958677202463
Validation loss = 0.0016621239483356476
Validation loss = 0.0017650045920163393
Validation loss = 0.0016474240692332387
Validation loss = 0.002135236281901598
Validation loss = 0.001642890740185976
Validation loss = 0.0015467823250219226
Validation loss = 0.0016609634039923549
Validation loss = 0.001856562215834856
Validation loss = 0.0019016638398170471
Validation loss = 0.0015713570173829794
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002736985683441162
Validation loss = 0.0020668860524892807
Validation loss = 0.001677260035648942
Validation loss = 0.001501324586570263
Validation loss = 0.0020648546051234007
Validation loss = 0.0022621836978942156
Validation loss = 0.001863214885815978
Validation loss = 0.0018201819621026516
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017241478199139237
Validation loss = 0.0018813270144164562
Validation loss = 0.0012346055591478944
Validation loss = 0.001337168738245964
Validation loss = 0.0013377098366618156
Validation loss = 0.001708400435745716
Validation loss = 0.0015012724325060844
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.17    |
| Iteration     | 40       |
| MaximumReturn | -0.00103 |
| MinimumReturn | -13.1    |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015198540640994906
Validation loss = 0.002373615512624383
Validation loss = 0.0017159119015559554
Validation loss = 0.0016160421073436737
Validation loss = 0.0013084333622828126
Validation loss = 0.0018111207755282521
Validation loss = 0.0016582636162638664
Validation loss = 0.0014433956239372492
Validation loss = 0.0020741780754178762
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0022667984012514353
Validation loss = 0.0013829419622197747
Validation loss = 0.0020977638196200132
Validation loss = 0.0015003540320321918
Validation loss = 0.0020455755293369293
Validation loss = 0.0015073257964104414
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001727885683067143
Validation loss = 0.0021068586502224207
Validation loss = 0.0016088536940515041
Validation loss = 0.0016664844006299973
Validation loss = 0.0013856582809239626
Validation loss = 0.0018211633432656527
Validation loss = 0.00250519928522408
Validation loss = 0.0013940137578174472
Validation loss = 0.0016701678978279233
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0018871958600357175
Validation loss = 0.0013366545317694545
Validation loss = 0.0015785093419253826
Validation loss = 0.0014368797419592738
Validation loss = 0.001674463739618659
Validation loss = 0.0017203581519424915
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0022161216475069523
Validation loss = 0.0020271248649805784
Validation loss = 0.0015750323655083776
Validation loss = 0.0023548167664557695
Validation loss = 0.0014025126583874226
Validation loss = 0.0015005910536274314
Validation loss = 0.0019035855075344443
Validation loss = 0.0018242135411128402
Validation loss = 0.0019684021826833487
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.19    |
| Iteration     | 41       |
| MaximumReturn | -0.0492  |
| MinimumReturn | -27.7    |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016366895288228989
Validation loss = 0.0013507269322872162
Validation loss = 0.0016755086835473776
Validation loss = 0.0013916860334575176
Validation loss = 0.001764062326401472
Validation loss = 0.0014647423522546887
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016233857022598386
Validation loss = 0.0015172610292211175
Validation loss = 0.0014558321563526988
Validation loss = 0.0036332402378320694
Validation loss = 0.0019942275248467922
Validation loss = 0.0013516925973817706
Validation loss = 0.0013696823734790087
Validation loss = 0.001312572043389082
Validation loss = 0.0014489492168650031
Validation loss = 0.0015382602578029037
Validation loss = 0.002038158243522048
Validation loss = 0.0017827438423410058
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0025864180643111467
Validation loss = 0.0015891294460743666
Validation loss = 0.0015008192276582122
Validation loss = 0.00299176131375134
Validation loss = 0.00163375458214432
Validation loss = 0.0013377844588831067
Validation loss = 0.0013528290437534451
Validation loss = 0.001333191292360425
Validation loss = 0.0018269512802362442
Validation loss = 0.001777722965925932
Validation loss = 0.0014056155923753977
Validation loss = 0.0021708256099373102
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013617341173812747
Validation loss = 0.0013761583250015974
Validation loss = 0.0012590005062520504
Validation loss = 0.001745660207234323
Validation loss = 0.0019277685787528753
Validation loss = 0.0015240587526932359
Validation loss = 0.0021800429094582796
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013149946462363005
Validation loss = 0.0013578844955191016
Validation loss = 0.002154785441234708
Validation loss = 0.0019213747000321746
Validation loss = 0.0015458508860319853
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0216  |
| Iteration     | 42       |
| MaximumReturn | -0.0033  |
| MinimumReturn | -0.0329  |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001627457793802023
Validation loss = 0.002367750508710742
Validation loss = 0.00147831195499748
Validation loss = 0.0019290540367364883
Validation loss = 0.0014172897208482027
Validation loss = 0.001625484088435769
Validation loss = 0.001483907806687057
Validation loss = 0.0019273193320259452
Validation loss = 0.001482632476836443
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001545266597531736
Validation loss = 0.0017704847268760204
Validation loss = 0.0017845077672973275
Validation loss = 0.0036930169444531202
Validation loss = 0.003378547029569745
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016340861329808831
Validation loss = 0.002233548555523157
Validation loss = 0.001556135481223464
Validation loss = 0.0018483352614566684
Validation loss = 0.0016268747858703136
Validation loss = 0.0014000546652823687
Validation loss = 0.001345138531178236
Validation loss = 0.0015463317977264524
Validation loss = 0.004078448284417391
Validation loss = 0.00154805404599756
Validation loss = 0.002008523093536496
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016611418686807156
Validation loss = 0.0015858926344662905
Validation loss = 0.0019139456562697887
Validation loss = 0.0013328557834029198
Validation loss = 0.0019812225364148617
Validation loss = 0.0015313561307266355
Validation loss = 0.001947228447534144
Validation loss = 0.0014520209515467286
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013827161164954305
Validation loss = 0.0018135139252990484
Validation loss = 0.0016361282905563712
Validation loss = 0.002473683562129736
Validation loss = 0.0018313961336389184
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00439 |
| Iteration     | 43       |
| MaximumReturn | -0.00068 |
| MinimumReturn | -0.0185  |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021894401870667934
Validation loss = 0.00168896303512156
Validation loss = 0.0017537176609039307
Validation loss = 0.001710353884845972
Validation loss = 0.0016797948628664017
Validation loss = 0.0016344572650268674
Validation loss = 0.0014697391306981444
Validation loss = 0.0021434982772916555
Validation loss = 0.001540131401270628
Validation loss = 0.0016162384999915957
Validation loss = 0.0019429301610216498
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001660818001255393
Validation loss = 0.0026175840757787228
Validation loss = 0.0013833076227456331
Validation loss = 0.0013288900954648852
Validation loss = 0.0016657011583447456
Validation loss = 0.0021658367477357388
Validation loss = 0.0013072039000689983
Validation loss = 0.001340188318863511
Validation loss = 0.0013178136432543397
Validation loss = 0.0013622920960187912
Validation loss = 0.0014111330965533853
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024771778844296932
Validation loss = 0.0017243627225980163
Validation loss = 0.003073235508054495
Validation loss = 0.001513168797828257
Validation loss = 0.0012914594262838364
Validation loss = 0.0017229017103090882
Validation loss = 0.0017069597961381078
Validation loss = 0.0013645589351654053
Validation loss = 0.0015685436083003879
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001225858461111784
Validation loss = 0.001525571569800377
Validation loss = 0.001605290686711669
Validation loss = 0.0023520253598690033
Validation loss = 0.001826345338486135
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001722975168377161
Validation loss = 0.001767011359333992
Validation loss = 0.001718662679195404
Validation loss = 0.0015270645963028073
Validation loss = 0.002419176045805216
Validation loss = 0.0014821813674643636
Validation loss = 0.002470356645062566
Validation loss = 0.0014505431754514575
Validation loss = 0.0016863157507032156
Validation loss = 0.0018464067252352834
Validation loss = 0.0015279289800673723
Validation loss = 0.00150874734390527
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.64    |
| Iteration     | 44       |
| MaximumReturn | -0.00111 |
| MinimumReturn | -38.1    |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015854887897148728
Validation loss = 0.0016693033976480365
Validation loss = 0.00172766437754035
Validation loss = 0.0016011918196454644
Validation loss = 0.001415548729710281
Validation loss = 0.001808968954719603
Validation loss = 0.002032149350270629
Validation loss = 0.0017340999329462647
Validation loss = 0.002401079284027219
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013988394057378173
Validation loss = 0.001722490880638361
Validation loss = 0.0020860787481069565
Validation loss = 0.0028548978734761477
Validation loss = 0.0014394434401765466
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001503489213064313
Validation loss = 0.0017163337906822562
Validation loss = 0.0014750746777281165
Validation loss = 0.002196473302319646
Validation loss = 0.0016418471932411194
Validation loss = 0.0014273800188675523
Validation loss = 0.0014781487407162786
Validation loss = 0.0017343969084322453
Validation loss = 0.0014283083146438003
Validation loss = 0.0015441648429259658
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001350215170532465
Validation loss = 0.001546611892990768
Validation loss = 0.002171996282413602
Validation loss = 0.0013816647697240114
Validation loss = 0.0013625768478959799
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001479270402342081
Validation loss = 0.0023491091560572386
Validation loss = 0.0014802513178437948
Validation loss = 0.0018303235992789268
Validation loss = 0.001960453111678362
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -16.7    |
| Iteration     | 45       |
| MaximumReturn | -0.0798  |
| MinimumReturn | -64.1    |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016424936475232244
Validation loss = 0.0018398454412817955
Validation loss = 0.001389477401971817
Validation loss = 0.0014229739317670465
Validation loss = 0.0017206555930897593
Validation loss = 0.0019724182784557343
Validation loss = 0.0015424673911184072
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001964214025065303
Validation loss = 0.001702183741144836
Validation loss = 0.0015397100942209363
Validation loss = 0.001623439253307879
Validation loss = 0.00125241675414145
Validation loss = 0.0017683901824057102
Validation loss = 0.001500604092143476
Validation loss = 0.002506519202142954
Validation loss = 0.001327401609160006
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0040234168991446495
Validation loss = 0.0016277218237519264
Validation loss = 0.0017843276727944613
Validation loss = 0.0013338314602151513
Validation loss = 0.0018683553207665682
Validation loss = 0.0025325396563857794
Validation loss = 0.0015304750995710492
Validation loss = 0.0014094449579715729
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0020488835871219635
Validation loss = 0.0016203775303438306
Validation loss = 0.001659322064369917
Validation loss = 0.001935634994879365
Validation loss = 0.002108348999172449
Validation loss = 0.001504990505054593
Validation loss = 0.0034614158794283867
Validation loss = 0.0016751649091020226
Validation loss = 0.0015264039393514395
Validation loss = 0.0013826729264110327
Validation loss = 0.0018454742385074496
Validation loss = 0.0015037850243970752
Validation loss = 0.001751275034621358
Validation loss = 0.00139407510869205
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018646145472303033
Validation loss = 0.001368967816233635
Validation loss = 0.0014690940733999014
Validation loss = 0.0018344211857765913
Validation loss = 0.0018782856641337276
Validation loss = 0.0029346891678869724
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.666    |
| Iteration     | 46        |
| MaximumReturn | -0.000642 |
| MinimumReturn | -10.4     |
| TotalSamples  | 79968     |
-----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002025327878072858
Validation loss = 0.0015327940927818418
Validation loss = 0.001880756812170148
Validation loss = 0.0015254393219947815
Validation loss = 0.002197313355281949
Validation loss = 0.0015792504418641329
Validation loss = 0.0019638410303741693
Validation loss = 0.0014007193967700005
Validation loss = 0.0015079909935593605
Validation loss = 0.0016974930185824633
Validation loss = 0.0013320528669282794
Validation loss = 0.0014813931193202734
Validation loss = 0.001515858108177781
Validation loss = 0.0015329899033531547
Validation loss = 0.001762365223839879
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014767650282010436
Validation loss = 0.0015863273292779922
Validation loss = 0.0019204284762963653
Validation loss = 0.0013660406693816185
Validation loss = 0.0016890183323994279
Validation loss = 0.0015621095662936568
Validation loss = 0.0013320092111825943
Validation loss = 0.0019887853413820267
Validation loss = 0.00144105963408947
Validation loss = 0.0013373817782849073
Validation loss = 0.0025672069750726223
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001616267254576087
Validation loss = 0.0016966272378340364
Validation loss = 0.0018955084960907698
Validation loss = 0.0014644123148173094
Validation loss = 0.0015512334648519754
Validation loss = 0.001425653463229537
Validation loss = 0.0020593113731592894
Validation loss = 0.0016164459520950913
Validation loss = 0.0013359562726691365
Validation loss = 0.0013238167157396674
Validation loss = 0.0015095194103196263
Validation loss = 0.0015145621728152037
Validation loss = 0.0032091743778437376
Validation loss = 0.001494367839768529
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002110640052706003
Validation loss = 0.0012124081840738654
Validation loss = 0.0016138184582814574
Validation loss = 0.0023509019520133734
Validation loss = 0.002722491743043065
Validation loss = 0.0015087465289980173
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015773109626024961
Validation loss = 0.0014424612745642662
Validation loss = 0.0016643751878291368
Validation loss = 0.0024519660510122776
Validation loss = 0.001747015630826354
Validation loss = 0.0015676375478506088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.47    |
| Iteration     | 47       |
| MaximumReturn | -0.00104 |
| MinimumReturn | -35.9    |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021695157047361135
Validation loss = 0.001604399411007762
Validation loss = 0.001921550021506846
Validation loss = 0.0019686021842062473
Validation loss = 0.00163205002900213
Validation loss = 0.0014275716384872794
Validation loss = 0.0018445976311340928
Validation loss = 0.0017602771986275911
Validation loss = 0.001578602590598166
Validation loss = 0.0025012122932821512
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001628204481676221
Validation loss = 0.0012952482793480158
Validation loss = 0.00155373546294868
Validation loss = 0.0014413691824302077
Validation loss = 0.0021891649812459946
Validation loss = 0.0013944257516413927
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001859979354776442
Validation loss = 0.001729921088553965
Validation loss = 0.001449314528144896
Validation loss = 0.0015113386325538158
Validation loss = 0.002034970326349139
Validation loss = 0.0016841503093019128
Validation loss = 0.0022509931586682796
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017052225302904844
Validation loss = 0.002275751903653145
Validation loss = 0.0025457139126956463
Validation loss = 0.0018705526599660516
Validation loss = 0.0016942523652687669
Validation loss = 0.0016522326041013002
Validation loss = 0.0018032172229140997
Validation loss = 0.0017509653698652983
Validation loss = 0.002275284379720688
Validation loss = 0.001410428318195045
Validation loss = 0.0019905646331608295
Validation loss = 0.0015267722774297
Validation loss = 0.001794886076822877
Validation loss = 0.0014600654831156135
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016140018124133348
Validation loss = 0.001424926333129406
Validation loss = 0.003355184569954872
Validation loss = 0.0026718408335000277
Validation loss = 0.0015612368006259203
Validation loss = 0.0013021882623434067
Validation loss = 0.001729881390929222
Validation loss = 0.0015184388030320406
Validation loss = 0.0014541642740368843
Validation loss = 0.00211825268343091
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.21    |
| Iteration     | 48       |
| MaximumReturn | -0.0147  |
| MinimumReturn | -39.8    |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0022813077084720135
Validation loss = 0.0015247943811118603
Validation loss = 0.001298576476983726
Validation loss = 0.0015698253409937024
Validation loss = 0.001472492003813386
Validation loss = 0.001354964799247682
Validation loss = 0.0016111757140606642
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017959329998120666
Validation loss = 0.0017195745604112744
Validation loss = 0.0016045469092205167
Validation loss = 0.0014989492483437061
Validation loss = 0.0015952690737321973
Validation loss = 0.002057045930996537
Validation loss = 0.0014665331691503525
Validation loss = 0.001634783111512661
Validation loss = 0.0014926792355254292
Validation loss = 0.0014251148095354438
Validation loss = 0.0024702094960957766
Validation loss = 0.00176970602478832
Validation loss = 0.001467999303713441
Validation loss = 0.001394791528582573
Validation loss = 0.001314183697104454
Validation loss = 0.001505357795394957
Validation loss = 0.0014135140227153897
Validation loss = 0.0016101319342851639
Validation loss = 0.0015013369265943766
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024639246985316277
Validation loss = 0.0015694897156208754
Validation loss = 0.0018176378216594458
Validation loss = 0.0014308833051472902
Validation loss = 0.001315182656981051
Validation loss = 0.00164202437736094
Validation loss = 0.0019121565856039524
Validation loss = 0.0014818760100752115
Validation loss = 0.0015225051902234554
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014284220524132252
Validation loss = 0.0014845142140984535
Validation loss = 0.0014452416216954589
Validation loss = 0.00145933055318892
Validation loss = 0.0019162555690854788
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00162711669690907
Validation loss = 0.002490245969966054
Validation loss = 0.0015048118075355887
Validation loss = 0.0015255025355145335
Validation loss = 0.0016481626080349088
Validation loss = 0.0013216572115197778
Validation loss = 0.0014272904954850674
Validation loss = 0.0014308398822322488
Validation loss = 0.0021306490525603294
Validation loss = 0.0017065532738342881
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -20      |
| Iteration     | 49       |
| MaximumReturn | -0.0355  |
| MinimumReturn | -62.1    |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015104854246601462
Validation loss = 0.0016167153371497989
Validation loss = 0.0014461716637015343
Validation loss = 0.0020054918713867664
Validation loss = 0.001537363976240158
Validation loss = 0.0014619209105148911
Validation loss = 0.0012888461351394653
Validation loss = 0.002509326906874776
Validation loss = 0.0017874760087579489
Validation loss = 0.0016125639667734504
Validation loss = 0.0013806086499243975
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001839558477513492
Validation loss = 0.0013029661495238543
Validation loss = 0.002688400913029909
Validation loss = 0.0014259414747357368
Validation loss = 0.0013294430682435632
Validation loss = 0.0018563657067716122
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001620274386368692
Validation loss = 0.0012884973548352718
Validation loss = 0.0017977593233808875
Validation loss = 0.0013159560039639473
Validation loss = 0.0020525427535176277
Validation loss = 0.0015772695187479258
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0023425330873578787
Validation loss = 0.0016006740042939782
Validation loss = 0.0014116923557594419
Validation loss = 0.001296181813813746
Validation loss = 0.002203368116170168
Validation loss = 0.001757822697982192
Validation loss = 0.0016344769392162561
Validation loss = 0.001769441063515842
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014312140410766006
Validation loss = 0.004066970199346542
Validation loss = 0.0016654530772939324
Validation loss = 0.0015155443688854575
Validation loss = 0.0013522460358217359
Validation loss = 0.0015110215172171593
Validation loss = 0.0016291053034365177
Validation loss = 0.0015785724390298128
Validation loss = 0.0013863968197256327
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -39.3    |
| Iteration     | 50       |
| MaximumReturn | -0.0173  |
| MinimumReturn | -106     |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014393230667337775
Validation loss = 0.0012817374663427472
Validation loss = 0.0015563275665044785
Validation loss = 0.0018064180621877313
Validation loss = 0.001967168413102627
Validation loss = 0.002328942995518446
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015118782175704837
Validation loss = 0.0017558138351887465
Validation loss = 0.002886473201215267
Validation loss = 0.001412134151905775
Validation loss = 0.0013310963986441493
Validation loss = 0.0016366548370569944
Validation loss = 0.0017043071566149592
Validation loss = 0.0017500303220003843
Validation loss = 0.001897185342386365
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014247256331145763
Validation loss = 0.0017574706580489874
Validation loss = 0.001567971776239574
Validation loss = 0.0015129235107451677
Validation loss = 0.0020351207349449396
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016986748669296503
Validation loss = 0.0013173532206565142
Validation loss = 0.0013989957515150309
Validation loss = 0.00139563565608114
Validation loss = 0.0014128435868769884
Validation loss = 0.0014776946045458317
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019707607571035624
Validation loss = 0.00147054938133806
Validation loss = 0.0017220419831573963
Validation loss = 0.0012015535030514002
Validation loss = 0.0012152298586443067
Validation loss = 0.001375503372400999
Validation loss = 0.0018378142267465591
Validation loss = 0.0026412359438836575
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.06    |
| Iteration     | 51       |
| MaximumReturn | -0.00087 |
| MinimumReturn | -36.4    |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018074028193950653
Validation loss = 0.0017669759690761566
Validation loss = 0.0019179345108568668
Validation loss = 0.001496774610131979
Validation loss = 0.0020161448046565056
Validation loss = 0.0016274519730359316
Validation loss = 0.0012479491997510195
Validation loss = 0.0017954966751858592
Validation loss = 0.001325890771113336
Validation loss = 0.0012729632435366511
Validation loss = 0.0015617985045537353
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017983258003368974
Validation loss = 0.0015807084273546934
Validation loss = 0.0014945089351385832
Validation loss = 0.00219210097566247
Validation loss = 0.00160833855625242
Validation loss = 0.0013851459370926023
Validation loss = 0.0017602823209017515
Validation loss = 0.0013157083885744214
Validation loss = 0.0012920511653646827
Validation loss = 0.0015183204086497426
Validation loss = 0.0015461101429536939
Validation loss = 0.0015776968793943524
Validation loss = 0.002703805221244693
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00192639057058841
Validation loss = 0.0016376458806917071
Validation loss = 0.0017310340190306306
Validation loss = 0.0019555038306862116
Validation loss = 0.0017144758021458983
Validation loss = 0.0014545995509251952
Validation loss = 0.001668082200922072
Validation loss = 0.0017131257336586714
Validation loss = 0.001306709717027843
Validation loss = 0.001968878787010908
Validation loss = 0.0014813761226832867
Validation loss = 0.0014738234458491206
Validation loss = 0.0014117482351139188
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001617903239093721
Validation loss = 0.0015247054398059845
Validation loss = 0.0013626203872263432
Validation loss = 0.00156278011854738
Validation loss = 0.0015225396491587162
Validation loss = 0.001687203417532146
Validation loss = 0.0014990842901170254
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00201376318000257
Validation loss = 0.0016790616791695356
Validation loss = 0.0012617884203791618
Validation loss = 0.0016208699671551585
Validation loss = 0.0024127429351210594
Validation loss = 0.001630194135941565
Validation loss = 0.0013925445964559913
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -33.1    |
| Iteration     | 52       |
| MaximumReturn | -0.00185 |
| MinimumReturn | -91.2    |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014678628649562597
Validation loss = 0.0012718411162495613
Validation loss = 0.0012684029061347246
Validation loss = 0.0020736076403409243
Validation loss = 0.0013035648735240102
Validation loss = 0.0014582425355911255
Validation loss = 0.0017494591884315014
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001967394957318902
Validation loss = 0.001822156016714871
Validation loss = 0.0013197099324315786
Validation loss = 0.001495384844020009
Validation loss = 0.0018145041540265083
Validation loss = 0.001793190836906433
Validation loss = 0.001326127559877932
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022858085576444864
Validation loss = 0.002756784437224269
Validation loss = 0.0014740581391379237
Validation loss = 0.001978561980649829
Validation loss = 0.0023280486930161715
Validation loss = 0.0024266699329018593
Validation loss = 0.0016118272906169295
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0024089987855404615
Validation loss = 0.0021841120906174183
Validation loss = 0.0015111976535990834
Validation loss = 0.0015681301010772586
Validation loss = 0.0015840864507481456
Validation loss = 0.001484614796936512
Validation loss = 0.001501584891229868
Validation loss = 0.001268913270905614
Validation loss = 0.0013730016071349382
Validation loss = 0.0015254777390509844
Validation loss = 0.0016544401878491044
Validation loss = 0.004013524856418371
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013364237966015935
Validation loss = 0.0013786654453724623
Validation loss = 0.0014096354134380817
Validation loss = 0.00125886092428118
Validation loss = 0.0015589934773743153
Validation loss = 0.001370642683468759
Validation loss = 0.0013975147157907486
Validation loss = 0.001989184645935893
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -10.6     |
| Iteration     | 53        |
| MaximumReturn | -0.000789 |
| MinimumReturn | -106      |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016659001121297479
Validation loss = 0.0017605945467948914
Validation loss = 0.001684102462604642
Validation loss = 0.0018744957633316517
Validation loss = 0.002604189794510603
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014826228143647313
Validation loss = 0.0013489429838955402
Validation loss = 0.0016746242763474584
Validation loss = 0.0015001415740698576
Validation loss = 0.0012648999691009521
Validation loss = 0.0018621094059199095
Validation loss = 0.0020985130686312914
Validation loss = 0.0016872964333742857
Validation loss = 0.0016571417218074203
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015925148036330938
Validation loss = 0.001555378781631589
Validation loss = 0.0016399803571403027
Validation loss = 0.001470749732106924
Validation loss = 0.0014779873890802264
Validation loss = 0.001436131540685892
Validation loss = 0.0022420999594032764
Validation loss = 0.0013521382352337241
Validation loss = 0.002481855684891343
Validation loss = 0.001382671995088458
Validation loss = 0.0014026601566001773
Validation loss = 0.0022685443982481956
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013372290413826704
Validation loss = 0.001807769644074142
Validation loss = 0.0020528577733784914
Validation loss = 0.001351818791590631
Validation loss = 0.002193278633058071
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0026687774807214737
Validation loss = 0.0013576589990407228
Validation loss = 0.0015104060294106603
Validation loss = 0.0017325964290648699
Validation loss = 0.0012127073714509606
Validation loss = 0.0014760007616132498
Validation loss = 0.0013340665027499199
Validation loss = 0.001381025300361216
Validation loss = 0.0015153493732213974
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -79      |
| Iteration     | 54       |
| MaximumReturn | -0.175   |
| MinimumReturn | -127     |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018655997700989246
Validation loss = 0.0011190403020009398
Validation loss = 0.0015738523798063397
Validation loss = 0.0014711639378219843
Validation loss = 0.0013873920543119311
Validation loss = 0.0016777184791862965
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016355199040845037
Validation loss = 0.001688084565103054
Validation loss = 0.0013907212996855378
Validation loss = 0.0017932079499587417
Validation loss = 0.0019590554293245077
Validation loss = 0.0013332460075616837
Validation loss = 0.0012842438882216811
Validation loss = 0.0012736328644677997
Validation loss = 0.0013921268982812762
Validation loss = 0.00266079930588603
Validation loss = 0.008312379010021687
Validation loss = 0.0016160225495696068
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016372575191780925
Validation loss = 0.0013067048275843263
Validation loss = 0.001459159073419869
Validation loss = 0.0012583605712279677
Validation loss = 0.0013887311797589064
Validation loss = 0.0018239779165014625
Validation loss = 0.001551455003209412
Validation loss = 0.0016540087526664138
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014573141233995557
Validation loss = 0.0014369107084348798
Validation loss = 0.0013306514592841268
Validation loss = 0.0013200677931308746
Validation loss = 0.0013523967936635017
Validation loss = 0.0013373962137848139
Validation loss = 0.0012813687790185213
Validation loss = 0.0016575687332078815
Validation loss = 0.0018222980434074998
Validation loss = 0.001410472672432661
Validation loss = 0.0016464997315779328
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001438609790056944
Validation loss = 0.0012367104645818472
Validation loss = 0.0011739827459678054
Validation loss = 0.0011942237615585327
Validation loss = 0.0012883743038401008
Validation loss = 0.0027108066715300083
Validation loss = 0.001198516576550901
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -106     |
| Iteration     | 55       |
| MaximumReturn | -30.8    |
| MinimumReturn | -130     |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014115789672359824
Validation loss = 0.001280454220250249
Validation loss = 0.0013798072468489408
Validation loss = 0.0014138058759272099
Validation loss = 0.0013541103107854724
Validation loss = 0.0014801109209656715
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0020818121265619993
Validation loss = 0.0013286375906318426
Validation loss = 0.0013416470028460026
Validation loss = 0.0012474397663027048
Validation loss = 0.001488511567004025
Validation loss = 0.001142056193202734
Validation loss = 0.0013553091557696462
Validation loss = 0.0013251462951302528
Validation loss = 0.0011260544415563345
Validation loss = 0.0012202744837850332
Validation loss = 0.0015684315003454685
Validation loss = 0.0011519333347678185
Validation loss = 0.001410469994880259
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012227225815877318
Validation loss = 0.0015384769067168236
Validation loss = 0.0014025765703991055
Validation loss = 0.001203990075737238
Validation loss = 0.001201049773953855
Validation loss = 0.0011243867920711637
Validation loss = 0.0014724587090313435
Validation loss = 0.0012234121095389128
Validation loss = 0.0014982008142396808
Validation loss = 0.0012113520642742515
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014890610473230481
Validation loss = 0.001288947300054133
Validation loss = 0.001165842404589057
Validation loss = 0.0012577921152114868
Validation loss = 0.0013809794327244163
Validation loss = 0.001844240934588015
Validation loss = 0.001410575583577156
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001386459800414741
Validation loss = 0.0011385787511244416
Validation loss = 0.0011686591897159815
Validation loss = 0.0016461703926324844
Validation loss = 0.0014681469183415174
Validation loss = 0.0015637818723917007
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -20.5     |
| Iteration     | 56        |
| MaximumReturn | -0.000823 |
| MinimumReturn | -143      |
| TotalSamples  | 96628     |
-----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025822650641202927
Validation loss = 0.0014175859978422523
Validation loss = 0.0013221766566857696
Validation loss = 0.0015903693856671453
Validation loss = 0.0012166071683168411
Validation loss = 0.0015501203015446663
Validation loss = 0.0023488905280828476
Validation loss = 0.001177415600977838
Validation loss = 0.001401372835971415
Validation loss = 0.0011797035112977028
Validation loss = 0.001152247772552073
Validation loss = 0.0015639549819752574
Validation loss = 0.0014093999052420259
Validation loss = 0.0012112993281334639
Validation loss = 0.0015135705471038818
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016173981130123138
Validation loss = 0.0014009232399985194
Validation loss = 0.0011892284965142608
Validation loss = 0.001377318985760212
Validation loss = 0.0023197687696665525
Validation loss = 0.0012346819275990129
Validation loss = 0.0019634340424090624
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0030358154326677322
Validation loss = 0.0013369470834732056
Validation loss = 0.001222250866703689
Validation loss = 0.0014609169447794557
Validation loss = 0.001336796791292727
Validation loss = 0.0013250777265056968
Validation loss = 0.0012620805064216256
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012412917567417026
Validation loss = 0.0015865681925788522
Validation loss = 0.0016174124320968986
Validation loss = 0.0015178974717855453
Validation loss = 0.0011231368407607079
Validation loss = 0.0013250564225018024
Validation loss = 0.0012190878624096513
Validation loss = 0.0014038127847015858
Validation loss = 0.0013448860263451934
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016094441525638103
Validation loss = 0.001413978636264801
Validation loss = 0.0014043641276657581
Validation loss = 0.0012508489890024066
Validation loss = 0.0012318784138187766
Validation loss = 0.0015416406095027924
Validation loss = 0.0015358744421973825
Validation loss = 0.0014063803246244788
Validation loss = 0.0014705393696203828
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -94.8    |
| Iteration     | 57       |
| MaximumReturn | -2.57    |
| MinimumReturn | -143     |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021296439226716757
Validation loss = 0.0012034268584102392
Validation loss = 0.001990811200812459
Validation loss = 0.0012825686717405915
Validation loss = 0.0012520203599706292
Validation loss = 0.0013445246731862426
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014423310058191419
Validation loss = 0.001395125174894929
Validation loss = 0.0012412184150889516
Validation loss = 0.0013396282447502017
Validation loss = 0.001269619562663138
Validation loss = 0.0013495570747181773
Validation loss = 0.001479140017181635
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013208871241658926
Validation loss = 0.001285443315282464
Validation loss = 0.0011452353792265058
Validation loss = 0.001490664086304605
Validation loss = 0.001137400045990944
Validation loss = 0.0015289779985323548
Validation loss = 0.0014946645824238658
Validation loss = 0.0015189400874078274
Validation loss = 0.0017356027383357286
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013006837107241154
Validation loss = 0.0010940671199932694
Validation loss = 0.0012373701902106404
Validation loss = 0.0011938068782910705
Validation loss = 0.0015560785541310906
Validation loss = 0.0012478542048484087
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0021959212608635426
Validation loss = 0.0012470922665670514
Validation loss = 0.0014497170923277736
Validation loss = 0.001369088888168335
Validation loss = 0.00206747162155807
Validation loss = 0.001172088785097003
Validation loss = 0.0012234431924298406
Validation loss = 0.0012838239781558514
Validation loss = 0.0014557004906237125
Validation loss = 0.0012803771533071995
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -65.8    |
| Iteration     | 58       |
| MaximumReturn | -0.0013  |
| MinimumReturn | -153     |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014344954397529364
Validation loss = 0.0013364115729928017
Validation loss = 0.001394876861013472
Validation loss = 0.0013960866490378976
Validation loss = 0.001451424090191722
Validation loss = 0.0015173909487202764
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011863798135891557
Validation loss = 0.0012116312282159925
Validation loss = 0.0011551791103556752
Validation loss = 0.0014594495296478271
Validation loss = 0.0014950199984014034
Validation loss = 0.002351992065086961
Validation loss = 0.0012136538280174136
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013545011170208454
Validation loss = 0.0011302883503958583
Validation loss = 0.0010916793253272772
Validation loss = 0.0025826268829405308
Validation loss = 0.0012621652567759156
Validation loss = 0.001114844810217619
Validation loss = 0.0010593431070446968
Validation loss = 0.0011341647477820516
Validation loss = 0.001486162655055523
Validation loss = 0.00143554643727839
Validation loss = 0.0011176425032317638
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012102844193577766
Validation loss = 0.001430358155630529
Validation loss = 0.0017350900452584028
Validation loss = 0.0012693164171651006
Validation loss = 0.0012203913647681475
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012855763779953122
Validation loss = 0.00228638993576169
Validation loss = 0.0013156916247680783
Validation loss = 0.0013472257414832711
Validation loss = 0.0012281155213713646
Validation loss = 0.0015737601788714528
Validation loss = 0.0021399513352662325
Validation loss = 0.0013006117660552263
Validation loss = 0.0022508820984512568
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -108     |
| Iteration     | 59       |
| MaximumReturn | -0.229   |
| MinimumReturn | -145     |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014203783357515931
Validation loss = 0.001324108219705522
Validation loss = 0.0011080706026405096
Validation loss = 0.001276975148357451
Validation loss = 0.0010824615601450205
Validation loss = 0.0019148194696754217
Validation loss = 0.0012336679501459002
Validation loss = 0.0013613337650895119
Validation loss = 0.0015842142747715116
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016767247579991817
Validation loss = 0.001340737333521247
Validation loss = 0.0011266220826655626
Validation loss = 0.001298686838708818
Validation loss = 0.0011439615627750754
Validation loss = 0.0011049173772335052
Validation loss = 0.001299385679885745
Validation loss = 0.0012768740998581052
Validation loss = 0.0010320317232981324
Validation loss = 0.0016164497938007116
Validation loss = 0.0012500050943344831
Validation loss = 0.0011362283257767558
Validation loss = 0.0014567291364073753
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012454867828637362
Validation loss = 0.0012543482007458806
Validation loss = 0.0011596218682825565
Validation loss = 0.00130854407325387
Validation loss = 0.0017312124837189913
Validation loss = 0.0020998052787035704
Validation loss = 0.0011233810801059008
Validation loss = 0.0012400881387293339
Validation loss = 0.001227696193382144
Validation loss = 0.0011418781941756606
Validation loss = 0.0015242963563650846
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012527513317763805
Validation loss = 0.001207137480378151
Validation loss = 0.0013187580043449998
Validation loss = 0.0011667939834296703
Validation loss = 0.0011685744393616915
Validation loss = 0.001564356847666204
Validation loss = 0.0012265898985788226
Validation loss = 0.0011551105417311192
Validation loss = 0.0014972913777455688
Validation loss = 0.0012058268766850233
Validation loss = 0.0012228251434862614
Validation loss = 0.001353639061562717
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014730116818100214
Validation loss = 0.0015394374495372176
Validation loss = 0.002852541394531727
Validation loss = 0.0013998067006468773
Validation loss = 0.0011882208054885268
Validation loss = 0.0013922323705628514
Validation loss = 0.0010613891063258052
Validation loss = 0.0010496902978047729
Validation loss = 0.0013884922955185175
Validation loss = 0.0013514431193470955
Validation loss = 0.0012438109843060374
Validation loss = 0.001094076200388372
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -104     |
| Iteration     | 60       |
| MaximumReturn | -0.444   |
| MinimumReturn | -155     |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001510494388639927
Validation loss = 0.0012409891933202744
Validation loss = 0.0011807950213551521
Validation loss = 0.0018085390329360962
Validation loss = 0.001312750275246799
Validation loss = 0.00135024543851614
Validation loss = 0.0013470947742462158
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011347135296091437
Validation loss = 0.0012765312567353249
Validation loss = 0.0011513549834489822
Validation loss = 0.0012007555924355984
Validation loss = 0.0011709507089108229
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001086401636712253
Validation loss = 0.001041715033352375
Validation loss = 0.0012849813792854548
Validation loss = 0.0013754820683971047
Validation loss = 0.0012925730552524328
Validation loss = 0.0011489212047308683
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011902233818545938
Validation loss = 0.0013539439532905817
Validation loss = 0.0010405362118035555
Validation loss = 0.0011991659412160516
Validation loss = 0.0015194360166788101
Validation loss = 0.0010078648338094354
Validation loss = 0.0014694153796881437
Validation loss = 0.001229110755957663
Validation loss = 0.0016741437138989568
Validation loss = 0.0012146327644586563
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011743276845663786
Validation loss = 0.0012637028703466058
Validation loss = 0.0013355384580790997
Validation loss = 0.0013758698478341103
Validation loss = 0.0013137584319338202
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -110     |
| Iteration     | 61       |
| MaximumReturn | -29.4    |
| MinimumReturn | -130     |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012235869653522968
Validation loss = 0.0011829603463411331
Validation loss = 0.0010287172626703978
Validation loss = 0.0011614995310083032
Validation loss = 0.0015320363454520702
Validation loss = 0.0010178695665672421
Validation loss = 0.0013413780834525824
Validation loss = 0.0011923178099095821
Validation loss = 0.0013683541910722852
Validation loss = 0.0012027163757011294
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011924202553927898
Validation loss = 0.0011400759685784578
Validation loss = 0.0011296711163595319
Validation loss = 0.0010533155873417854
Validation loss = 0.0012170409318059683
Validation loss = 0.0014457632787525654
Validation loss = 0.0010288158664479852
Validation loss = 0.0012212933506816626
Validation loss = 0.0012732177274301648
Validation loss = 0.0010451802518218756
Validation loss = 0.0013713439693674445
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011308216489851475
Validation loss = 0.001190561568364501
Validation loss = 0.0012000371934846044
Validation loss = 0.0011775466846302152
Validation loss = 0.0011197992134839296
Validation loss = 0.0011687905061990023
Validation loss = 0.001263901125639677
Validation loss = 0.0020480554085224867
Validation loss = 0.0010015099542215466
Validation loss = 0.001187397399917245
Validation loss = 0.0011936648515984416
Validation loss = 0.0012337492080405354
Validation loss = 0.0009804654400795698
Validation loss = 0.001056986628100276
Validation loss = 0.0013987583806738257
Validation loss = 0.0010162803810089827
Validation loss = 0.0014969041803851724
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012815005611628294
Validation loss = 0.001133882673457265
Validation loss = 0.0010139993391931057
Validation loss = 0.0012876943219453096
Validation loss = 0.0015998908784240484
Validation loss = 0.0011780575150623918
Validation loss = 0.001036045840010047
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015244943788275123
Validation loss = 0.001192477997392416
Validation loss = 0.0012804475845769048
Validation loss = 0.0010397954611107707
Validation loss = 0.0013929895358160138
Validation loss = 0.0010835645953193307
Validation loss = 0.0013998978538438678
Validation loss = 0.0015961435856297612
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -102     |
| Iteration     | 62       |
| MaximumReturn | -32.2    |
| MinimumReturn | -138     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010270726634189487
Validation loss = 0.001076159649528563
Validation loss = 0.0012234343448653817
Validation loss = 0.0011020321398973465
Validation loss = 0.0011985448654741049
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001885677338577807
Validation loss = 0.0013725588796660304
Validation loss = 0.0010297793196514249
Validation loss = 0.0013164575211703777
Validation loss = 0.0012053413083776832
Validation loss = 0.0012878982815891504
Validation loss = 0.000957665906753391
Validation loss = 0.0016084383241832256
Validation loss = 0.0009821049170568585
Validation loss = 0.0010717500699684024
Validation loss = 0.001186095061711967
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009885824983939528
Validation loss = 0.0014846274862065911
Validation loss = 0.0013621656689792871
Validation loss = 0.0011506309965625405
Validation loss = 0.0014261787291616201
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012098920997232199
Validation loss = 0.0011881203390657902
Validation loss = 0.0010292264632880688
Validation loss = 0.0013388895895332098
Validation loss = 0.0011620240984484553
Validation loss = 0.0012451254297047853
Validation loss = 0.0011423761025071144
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011195492697879672
Validation loss = 0.0011413454776629806
Validation loss = 0.0009096935973502696
Validation loss = 0.0012684885878115892
Validation loss = 0.0011779471533372998
Validation loss = 0.0011037037475034595
Validation loss = 0.00119748804718256
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -59.2    |
| Iteration     | 63       |
| MaximumReturn | -0.113   |
| MinimumReturn | -96.3    |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010798412840813398
Validation loss = 0.001269308035261929
Validation loss = 0.0015313230687752366
Validation loss = 0.001482842955738306
Validation loss = 0.001043165335431695
Validation loss = 0.0008925597066991031
Validation loss = 0.001028834143653512
Validation loss = 0.0009338108357042074
Validation loss = 0.0010634013451635838
Validation loss = 0.0011313195573166013
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011035131756216288
Validation loss = 0.0011189107317477465
Validation loss = 0.0009588993852958083
Validation loss = 0.0011196810519322753
Validation loss = 0.0010739295976236463
Validation loss = 0.0013003387721255422
Validation loss = 0.0010994719341397285
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001125984825193882
Validation loss = 0.0011863468680530787
Validation loss = 0.0010333491954952478
Validation loss = 0.0012484588660299778
Validation loss = 0.0011250159004703164
Validation loss = 0.0010071576107293367
Validation loss = 0.0009783991845324636
Validation loss = 0.0010805835481733084
Validation loss = 0.0011932349298149347
Validation loss = 0.0010613572085276246
Validation loss = 0.0013775184052065015
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012852574000135064
Validation loss = 0.001305116224102676
Validation loss = 0.001411155448295176
Validation loss = 0.0011918790405616164
Validation loss = 0.0009865171741694212
Validation loss = 0.0010280860587954521
Validation loss = 0.0013665169244632125
Validation loss = 0.0015636916505172849
Validation loss = 0.0010919860797002912
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010185691062361002
Validation loss = 0.0014213884714990854
Validation loss = 0.0010322165908291936
Validation loss = 0.0010237145470455289
Validation loss = 0.0009412267827428877
Validation loss = 0.0010424958309158683
Validation loss = 0.0009662412339821458
Validation loss = 0.0011122062569484115
Validation loss = 0.001166111440397799
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -27.1     |
| Iteration     | 64        |
| MaximumReturn | -0.000918 |
| MinimumReturn | -106      |
| TotalSamples  | 109956    |
-----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001054125837981701
Validation loss = 0.0015148627571761608
Validation loss = 0.0010282957227900624
Validation loss = 0.0010651479242369533
Validation loss = 0.0009873559465631843
Validation loss = 0.001118498737923801
Validation loss = 0.001641049049794674
Validation loss = 0.0010469775879755616
Validation loss = 0.0009606421808712184
Validation loss = 0.0010401176987215877
Validation loss = 0.0013985841069370508
Validation loss = 0.001224173465743661
Validation loss = 0.0010103622917085886
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009365376899950206
Validation loss = 0.0010196258081123233
Validation loss = 0.0010934002930298448
Validation loss = 0.001143328147009015
Validation loss = 0.001487988280132413
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014196893898770213
Validation loss = 0.0014366236282512546
Validation loss = 0.001060098991729319
Validation loss = 0.0010715071111917496
Validation loss = 0.0009062319877557456
Validation loss = 0.001178260426968336
Validation loss = 0.0011412519961595535
Validation loss = 0.0010463783983141184
Validation loss = 0.000978396157734096
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010792759712785482
Validation loss = 0.0011409209109842777
Validation loss = 0.0011844407999888062
Validation loss = 0.001123185153119266
Validation loss = 0.001142262015491724
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009531944524496794
Validation loss = 0.0011868918081745505
Validation loss = 0.0010057821637019515
Validation loss = 0.0013112269807606936
Validation loss = 0.0011852673487737775
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -68.8    |
| Iteration     | 65       |
| MaximumReturn | -0.0099  |
| MinimumReturn | -118     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009723573457449675
Validation loss = 0.0009621643694117665
Validation loss = 0.0015796562656760216
Validation loss = 0.0009385047596879303
Validation loss = 0.0012776426738128066
Validation loss = 0.0010925608221441507
Validation loss = 0.0008907578885555267
Validation loss = 0.0009997485904023051
Validation loss = 0.0014175259275361896
Validation loss = 0.001362210838124156
Validation loss = 0.001160056795924902
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015323328552767634
Validation loss = 0.0011214507976546884
Validation loss = 0.0009610946872271597
Validation loss = 0.0010118006030097604
Validation loss = 0.0011133948573842645
Validation loss = 0.0010569972218945622
Validation loss = 0.001021319767460227
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000994464848190546
Validation loss = 0.0011485974537208676
Validation loss = 0.00106584164313972
Validation loss = 0.0009094429551623762
Validation loss = 0.0012331263860687613
Validation loss = 0.000902135856449604
Validation loss = 0.001049062586389482
Validation loss = 0.001178222126327455
Validation loss = 0.0010643522255122662
Validation loss = 0.001364648574963212
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010247033787891269
Validation loss = 0.0011738762259483337
Validation loss = 0.0011415141634643078
Validation loss = 0.0012781518744304776
Validation loss = 0.001042346702888608
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00120331603102386
Validation loss = 0.0009458864224143326
Validation loss = 0.0009026313200592995
Validation loss = 0.0009277748176828027
Validation loss = 0.0010609989985823631
Validation loss = 0.0011005958076566458
Validation loss = 0.0008999112178571522
Validation loss = 0.0021048628259450197
Validation loss = 0.0011207814095541835
Validation loss = 0.0009673778549768031
Validation loss = 0.001041762880049646
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -24.1    |
| Iteration     | 66       |
| MaximumReturn | -0.00159 |
| MinimumReturn | -116     |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009896650444716215
Validation loss = 0.0011111438507214189
Validation loss = 0.0010705518070608377
Validation loss = 0.001526657142676413
Validation loss = 0.0009184471564367414
Validation loss = 0.001081805326975882
Validation loss = 0.0008918500388972461
Validation loss = 0.0008672340773046017
Validation loss = 0.0010368026560172439
Validation loss = 0.0008993325755000114
Validation loss = 0.0008949441835284233
Validation loss = 0.0009784827707335353
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011463655391708016
Validation loss = 0.0009675744222477078
Validation loss = 0.0009841230930760503
Validation loss = 0.0009486298658885062
Validation loss = 0.0010339232394471765
Validation loss = 0.001851484877988696
Validation loss = 0.0009092612308450043
Validation loss = 0.0010886291274800897
Validation loss = 0.0008970924536697567
Validation loss = 0.0009599268087185919
Validation loss = 0.0009326103026978672
Validation loss = 0.0009003617451526225
Validation loss = 0.0009383645956404507
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014222705503925681
Validation loss = 0.0011193918762728572
Validation loss = 0.0010047097457572818
Validation loss = 0.001089157653041184
Validation loss = 0.0009086514473892748
Validation loss = 0.0011559443082660437
Validation loss = 0.0014204320032149553
Validation loss = 0.0009716277127154171
Validation loss = 0.0008689420064911246
Validation loss = 0.0010162399848923087
Validation loss = 0.0010102365631610155
Validation loss = 0.001233249087817967
Validation loss = 0.0010748194763436913
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001146616181358695
Validation loss = 0.0010263872100040317
Validation loss = 0.0009324977290816605
Validation loss = 0.0011019546072930098
Validation loss = 0.0011585239553824067
Validation loss = 0.001242465921677649
Validation loss = 0.001197993871755898
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001497946446761489
Validation loss = 0.001636246801353991
Validation loss = 0.0015355226350948215
Validation loss = 0.0011786610120907426
Validation loss = 0.0009049009531736374
Validation loss = 0.0009803965222090483
Validation loss = 0.0009982710471376777
Validation loss = 0.0016813304973766208
Validation loss = 0.0009570016409270465
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -8.39     |
| Iteration     | 67        |
| MaximumReturn | -0.000655 |
| MinimumReturn | -84.1     |
| TotalSamples  | 114954    |
-----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010487019317224622
Validation loss = 0.0012465805048123002
Validation loss = 0.0012464297469705343
Validation loss = 0.0011628527427092195
Validation loss = 0.0012266782578080893
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009330130997113883
Validation loss = 0.0010678740218281746
Validation loss = 0.001146046444773674
Validation loss = 0.0011140931164845824
Validation loss = 0.0011168086202815175
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008953437209129333
Validation loss = 0.0014618602581322193
Validation loss = 0.0009464919567108154
Validation loss = 0.0010415861615911126
Validation loss = 0.0015508499927818775
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001069123623892665
Validation loss = 0.0010703264269977808
Validation loss = 0.0010880603222176433
Validation loss = 0.001300145871937275
Validation loss = 0.0009805875597521663
Validation loss = 0.00120365503244102
Validation loss = 0.0010167276486754417
Validation loss = 0.0012840770650655031
Validation loss = 0.0010143938707187772
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011028499575331807
Validation loss = 0.0011919463286176324
Validation loss = 0.0009385573212057352
Validation loss = 0.0010165109997615218
Validation loss = 0.0014036749489605427
Validation loss = 0.0010240632109344006
Validation loss = 0.0014345200033858418
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -51.6    |
| Iteration     | 68       |
| MaximumReturn | -0.0282  |
| MinimumReturn | -128     |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001402045483700931
Validation loss = 0.001032393192872405
Validation loss = 0.0010556599590927362
Validation loss = 0.0010100106010213494
Validation loss = 0.0018183680949732661
Validation loss = 0.0010211897315457463
Validation loss = 0.001027590362355113
Validation loss = 0.0010252258507534862
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001167149399407208
Validation loss = 0.0009703792165964842
Validation loss = 0.0009632980218157172
Validation loss = 0.000987833016552031
Validation loss = 0.0010385746136307716
Validation loss = 0.000990843283943832
Validation loss = 0.0010227749589830637
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009187614778056741
Validation loss = 0.0010430836118757725
Validation loss = 0.0010355724953114986
Validation loss = 0.0011566882021725178
Validation loss = 0.0009425427997484803
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009329462191089988
Validation loss = 0.0016851865220814943
Validation loss = 0.0009400998242199421
Validation loss = 0.0009187741088680923
Validation loss = 0.000973886635620147
Validation loss = 0.0009689782164059579
Validation loss = 0.0012747070286422968
Validation loss = 0.0015957826981320977
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010430817492306232
Validation loss = 0.0010794374393299222
Validation loss = 0.0013241763226687908
Validation loss = 0.0021004609297960997
Validation loss = 0.0009195419261232018
Validation loss = 0.0010389735689386725
Validation loss = 0.001127591822296381
Validation loss = 0.0009312154143117368
Validation loss = 0.001107300166040659
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.14     |
| Iteration     | 69        |
| MaximumReturn | -0.000729 |
| MinimumReturn | -26.7     |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010376358404755592
Validation loss = 0.0009283957188017666
Validation loss = 0.0015430839266628027
Validation loss = 0.000985829858109355
Validation loss = 0.0011370958527550101
Validation loss = 0.0012101551983505487
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011120483977720141
Validation loss = 0.0009376996895298362
Validation loss = 0.0009952271357178688
Validation loss = 0.0009557190351188183
Validation loss = 0.0013611448230221868
Validation loss = 0.001098469365388155
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012117009609937668
Validation loss = 0.00092119793407619
Validation loss = 0.0010929908603429794
Validation loss = 0.0010662899585440755
Validation loss = 0.0009962664917111397
Validation loss = 0.0022389530204236507
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010857880115509033
Validation loss = 0.0009235340403392911
Validation loss = 0.001234649564139545
Validation loss = 0.0011031469330191612
Validation loss = 0.000913692289032042
Validation loss = 0.0012318799272179604
Validation loss = 0.0009821002604439855
Validation loss = 0.0010626279981806874
Validation loss = 0.0009309935849159956
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010087847476825118
Validation loss = 0.0018041195580735803
Validation loss = 0.0009550697868689895
Validation loss = 0.0009874104289337993
Validation loss = 0.0008966397144831717
Validation loss = 0.0009527304791845381
Validation loss = 0.0009541480103507638
Validation loss = 0.0011657292488962412
Validation loss = 0.001526589272543788
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -79.1    |
| Iteration     | 70       |
| MaximumReturn | -0.0384  |
| MinimumReturn | -122     |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009998725727200508
Validation loss = 0.001256548217497766
Validation loss = 0.0009422891307622194
Validation loss = 0.00196232576854527
Validation loss = 0.0010133028263226151
Validation loss = 0.000884760229382664
Validation loss = 0.0010512786684557796
Validation loss = 0.0009512577671557665
Validation loss = 0.001839894917793572
Validation loss = 0.001891723950393498
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001026952755637467
Validation loss = 0.0010949330171570182
Validation loss = 0.0013359065633267164
Validation loss = 0.0009147109230980277
Validation loss = 0.0008999143028631806
Validation loss = 0.0009689629659987986
Validation loss = 0.0009585396619513631
Validation loss = 0.001211621449328959
Validation loss = 0.0008249676320701838
Validation loss = 0.0009337360388599336
Validation loss = 0.0011141409631818533
Validation loss = 0.0008848929428495467
Validation loss = 0.0009728875011205673
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00104890379589051
Validation loss = 0.0009143437491729856
Validation loss = 0.0009304739069193602
Validation loss = 0.001007779035717249
Validation loss = 0.0009014234528876841
Validation loss = 0.0011310707777738571
Validation loss = 0.0010711385402828455
Validation loss = 0.001134194782935083
Validation loss = 0.0010167572181671858
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009709148434922099
Validation loss = 0.0009605311206541955
Validation loss = 0.0014612338272854686
Validation loss = 0.001286193379200995
Validation loss = 0.001183870597742498
Validation loss = 0.000888478010892868
Validation loss = 0.0011122036958113313
Validation loss = 0.0010507175466045737
Validation loss = 0.0008551938226446509
Validation loss = 0.0009636179893277586
Validation loss = 0.0010481826029717922
Validation loss = 0.0009621836361475289
Validation loss = 0.001627804827876389
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009866382461041212
Validation loss = 0.0008831584127619863
Validation loss = 0.0009331030305474997
Validation loss = 0.0010571087477728724
Validation loss = 0.0009564465726725757
Validation loss = 0.0012908497592434287
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -7.67     |
| Iteration     | 71        |
| MaximumReturn | -0.000623 |
| MinimumReturn | -74.4     |
| TotalSamples  | 121618    |
-----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009640963398851454
Validation loss = 0.0010531236184760928
Validation loss = 0.0009599722689017653
Validation loss = 0.0016121024964377284
Validation loss = 0.0009819668484851718
Validation loss = 0.0010545949917286634
Validation loss = 0.0013082983205094934
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016842841869220138
Validation loss = 0.001012160675600171
Validation loss = 0.0010426051449030638
Validation loss = 0.0017101396806538105
Validation loss = 0.0009632337023504078
Validation loss = 0.001040919916704297
Validation loss = 0.0010731634683907032
Validation loss = 0.000988041516393423
Validation loss = 0.0012201593490317464
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010482755023986101
Validation loss = 0.0017199896974489093
Validation loss = 0.0010038209147751331
Validation loss = 0.0009044025791808963
Validation loss = 0.000950565212406218
Validation loss = 0.0009954534471035004
Validation loss = 0.0014535720692947507
Validation loss = 0.0008262491901405156
Validation loss = 0.0008892444311641157
Validation loss = 0.0008041817927733064
Validation loss = 0.0011056263465434313
Validation loss = 0.0009000123245641589
Validation loss = 0.0009024885366670787
Validation loss = 0.0010099064093083143
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012051896192133427
Validation loss = 0.0012179628247395158
Validation loss = 0.0009959394810721278
Validation loss = 0.0012223892845213413
Validation loss = 0.0009728431468829513
Validation loss = 0.001499251346103847
Validation loss = 0.001045666984282434
Validation loss = 0.0009058099240064621
Validation loss = 0.001073191175237298
Validation loss = 0.0008868798613548279
Validation loss = 0.0010659325635060668
Validation loss = 0.001120858360081911
Validation loss = 0.0010730194626376033
Validation loss = 0.0013159014051780105
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009733688784763217
Validation loss = 0.001129387179389596
Validation loss = 0.0014254349516704679
Validation loss = 0.0009272333700209856
Validation loss = 0.0008960634586401284
Validation loss = 0.000936707656364888
Validation loss = 0.0008812191663309932
Validation loss = 0.001048585632815957
Validation loss = 0.0010636369697749615
Validation loss = 0.0016780577134341002
Validation loss = 0.0011656747665256262
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -43.5    |
| Iteration     | 72       |
| MaximumReturn | -0.00183 |
| MinimumReturn | -113     |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010322772432118654
Validation loss = 0.0010832346742972732
Validation loss = 0.0012409244664013386
Validation loss = 0.00100715400185436
Validation loss = 0.0012046699412167072
Validation loss = 0.0009877971606329083
Validation loss = 0.000961842539254576
Validation loss = 0.0009904083563014865
Validation loss = 0.0011454642517492175
Validation loss = 0.000937833683565259
Validation loss = 0.0011397574562579393
Validation loss = 0.0009753218619152904
Validation loss = 0.0010184635175392032
Validation loss = 0.0010488730622455478
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010922928340733051
Validation loss = 0.0011373471934348345
Validation loss = 0.0010157681535929441
Validation loss = 0.0011145210592076182
Validation loss = 0.0011488895397633314
Validation loss = 0.0009973926935344934
Validation loss = 0.0009283858234994113
Validation loss = 0.0009307868895120919
Validation loss = 0.0009447943884879351
Validation loss = 0.0009894471149891615
Validation loss = 0.0009102860931307077
Validation loss = 0.00126170355360955
Validation loss = 0.001733353710733354
Validation loss = 0.0010438341414555907
Validation loss = 0.0008738377364352345
Validation loss = 0.001688468037173152
Validation loss = 0.001057369401678443
Validation loss = 0.0008999044657684863
Validation loss = 0.001388206728734076
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009146393858827651
Validation loss = 0.0012559503084048629
Validation loss = 0.0009788165334612131
Validation loss = 0.0009050959488376975
Validation loss = 0.0009491356322541833
Validation loss = 0.0009431225480511785
Validation loss = 0.0011084360303357244
Validation loss = 0.0011397405760362744
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009633781737647951
Validation loss = 0.0008943742141127586
Validation loss = 0.0011658959556370974
Validation loss = 0.0010689494665712118
Validation loss = 0.0009408352780155838
Validation loss = 0.0009591160342097282
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010252137435600162
Validation loss = 0.0016762230079621077
Validation loss = 0.0010686061577871442
Validation loss = 0.0011144031304866076
Validation loss = 0.0008717569871805608
Validation loss = 0.0010368808871135116
Validation loss = 0.001135117607191205
Validation loss = 0.0012764099519699812
Validation loss = 0.0010356995044276118
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -55.6    |
| Iteration     | 73       |
| MaximumReturn | -0.0877  |
| MinimumReturn | -121     |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010106541449204087
Validation loss = 0.0010403442429378629
Validation loss = 0.0011292359558865428
Validation loss = 0.0009675508481450379
Validation loss = 0.0008935487130656838
Validation loss = 0.0012891613878309727
Validation loss = 0.0014055880019441247
Validation loss = 0.0010151368333026767
Validation loss = 0.0008934636134654284
Validation loss = 0.0010320795699954033
Validation loss = 0.0011900566751137376
Validation loss = 0.0008666471694596112
Validation loss = 0.0009539300226606429
Validation loss = 0.0010479262564331293
Validation loss = 0.0011536816600710154
Validation loss = 0.0009874773677438498
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009240764775313437
Validation loss = 0.0009439654531888664
Validation loss = 0.0009783182758837938
Validation loss = 0.0008960848790593445
Validation loss = 0.001769677852280438
Validation loss = 0.0011058206437155604
Validation loss = 0.0009451584191992879
Validation loss = 0.0009935451671481133
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008638662984594703
Validation loss = 0.0009003679733723402
Validation loss = 0.0008917140075936913
Validation loss = 0.0008444316335953772
Validation loss = 0.000869261275511235
Validation loss = 0.0009073993423953652
Validation loss = 0.0010332799283787608
Validation loss = 0.0008063907152973115
Validation loss = 0.0012041832087561488
Validation loss = 0.0012820480624213815
Validation loss = 0.0009169664699584246
Validation loss = 0.0010420859325677156
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009060446172952652
Validation loss = 0.001008526305668056
Validation loss = 0.0011165851028636098
Validation loss = 0.00111559743527323
Validation loss = 0.0008734908187761903
Validation loss = 0.001006293692626059
Validation loss = 0.0009529953240416944
Validation loss = 0.0010550362057983875
Validation loss = 0.0008842695388011634
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009813191136345267
Validation loss = 0.0010166858555749059
Validation loss = 0.0009595446172170341
Validation loss = 0.0012886059703305364
Validation loss = 0.0010485794628039002
Validation loss = 0.0008808876154944301
Validation loss = 0.001082726870663464
Validation loss = 0.0012824652949348092
Validation loss = 0.0009987815283238888
Validation loss = 0.0012693095486611128
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -23.1    |
| Iteration     | 74       |
| MaximumReturn | -0.00148 |
| MinimumReturn | -76.2    |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011381995864212513
Validation loss = 0.001004585763439536
Validation loss = 0.0010143669787794352
Validation loss = 0.00105959118809551
Validation loss = 0.0010285584721714258
Validation loss = 0.0011338340118527412
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008663060725666583
Validation loss = 0.0010785599006339908
Validation loss = 0.0008478023228235543
Validation loss = 0.0011232333490625024
Validation loss = 0.0011009559966623783
Validation loss = 0.0009399278787896037
Validation loss = 0.0008886748109944165
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00120242265984416
Validation loss = 0.001150510972365737
Validation loss = 0.0015891959192231297
Validation loss = 0.0008263057679869235
Validation loss = 0.0010390562238171697
Validation loss = 0.0025765530299395323
Validation loss = 0.0009122010669670999
Validation loss = 0.0012245404068380594
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009132832637988031
Validation loss = 0.0009729075827635825
Validation loss = 0.001094142789952457
Validation loss = 0.0010016466258093715
Validation loss = 0.0009132775012403727
Validation loss = 0.0009559670579619706
Validation loss = 0.0009856770047917962
Validation loss = 0.001023593358695507
Validation loss = 0.0009736778447404504
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011560773709788918
Validation loss = 0.0008439907105639577
Validation loss = 0.0009623573860153556
Validation loss = 0.0010841067414730787
Validation loss = 0.0025847782380878925
Validation loss = 0.0009659647475928068
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -33.2    |
| Iteration     | 75       |
| MaximumReturn | -0.125   |
| MinimumReturn | -59.6    |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001005522906780243
Validation loss = 0.0010880704503506422
Validation loss = 0.00082611502148211
Validation loss = 0.0009426312753930688
Validation loss = 0.0011952556669712067
Validation loss = 0.0009632652509026229
Validation loss = 0.0008776240283623338
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001022480195388198
Validation loss = 0.0008972288924269378
Validation loss = 0.0010433138813823462
Validation loss = 0.0010457304306328297
Validation loss = 0.0008781742653809488
Validation loss = 0.001215085620060563
Validation loss = 0.001405150629580021
Validation loss = 0.0009390603518113494
Validation loss = 0.0009700979571789503
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000905896769836545
Validation loss = 0.001108255935832858
Validation loss = 0.0009892440866678953
Validation loss = 0.0008112319046631455
Validation loss = 0.0008977729012258351
Validation loss = 0.0017530499026179314
Validation loss = 0.0009643605444580317
Validation loss = 0.0009059474687092006
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013720651622861624
Validation loss = 0.0010256273671984673
Validation loss = 0.00106753408908844
Validation loss = 0.0009037948912009597
Validation loss = 0.0010586478747427464
Validation loss = 0.0009634009329602122
Validation loss = 0.0011548309121280909
Validation loss = 0.001132703386247158
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010469583794474602
Validation loss = 0.003978914115577936
Validation loss = 0.0009269921574741602
Validation loss = 0.000965180282946676
Validation loss = 0.000888946233317256
Validation loss = 0.0010346476919949055
Validation loss = 0.0010834410786628723
Validation loss = 0.0010300802532583475
Validation loss = 0.0009044712642207742
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.73     |
| Iteration     | 76        |
| MaximumReturn | -0.000744 |
| MinimumReturn | -67.9     |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009580313926562667
Validation loss = 0.000983944977633655
Validation loss = 0.0010158614022657275
Validation loss = 0.0010462191421538591
Validation loss = 0.0013880119659006596
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008913842029869556
Validation loss = 0.002931380644440651
Validation loss = 0.001077849417924881
Validation loss = 0.0009013770031742752
Validation loss = 0.000918969395570457
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010188546730205417
Validation loss = 0.000970812572631985
Validation loss = 0.0008370153373107314
Validation loss = 0.0008328313706442714
Validation loss = 0.0009536073193885386
Validation loss = 0.0017062058905139565
Validation loss = 0.0009053379180841148
Validation loss = 0.0009168696124106646
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011101171839982271
Validation loss = 0.0009936110582202673
Validation loss = 0.0008679915335960686
Validation loss = 0.0009018292766995728
Validation loss = 0.0009916479466482997
Validation loss = 0.0009171435376629233
Validation loss = 0.0009307362488470972
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009433134109713137
Validation loss = 0.001027580350637436
Validation loss = 0.0009631280554458499
Validation loss = 0.001383321825414896
Validation loss = 0.002344906097277999
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -6.62     |
| Iteration     | 77        |
| MaximumReturn | -0.000913 |
| MinimumReturn | -64.3     |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009069701191037893
Validation loss = 0.0009895836701616645
Validation loss = 0.001300776144489646
Validation loss = 0.001239681150764227
Validation loss = 0.0009490717202425003
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010250960476696491
Validation loss = 0.0011913327034562826
Validation loss = 0.0014881315873935819
Validation loss = 0.0019151102751493454
Validation loss = 0.0009554475545883179
Validation loss = 0.0009261583909392357
Validation loss = 0.001141208573244512
Validation loss = 0.001275526941753924
Validation loss = 0.0009207349503412843
Validation loss = 0.0009159873588941991
Validation loss = 0.0009983041090890765
Validation loss = 0.0008941479027271271
Validation loss = 0.0008814100292511284
Validation loss = 0.001891127903945744
Validation loss = 0.0011015924392268062
Validation loss = 0.0011563999578356743
Validation loss = 0.0013547168346121907
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001382368616759777
Validation loss = 0.0009884247556328773
Validation loss = 0.0008105965098366141
Validation loss = 0.0009079051669687033
Validation loss = 0.001029425417073071
Validation loss = 0.0009333529742434621
Validation loss = 0.0010843233903869987
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000978033640421927
Validation loss = 0.0009557307348586619
Validation loss = 0.0010958859929814935
Validation loss = 0.001038380665704608
Validation loss = 0.0010845636716112494
Validation loss = 0.0008845747797749937
Validation loss = 0.0011817446211352944
Validation loss = 0.000986449304036796
Validation loss = 0.0009109452948905528
Validation loss = 0.0010649485047906637
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011736927554011345
Validation loss = 0.0008734881412237883
Validation loss = 0.0010072010336443782
Validation loss = 0.0008984433952718973
Validation loss = 0.0011211951496079564
Validation loss = 0.0010582495015114546
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.33    |
| Iteration     | 78       |
| MaximumReturn | -0.00068 |
| MinimumReturn | -59.4    |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009049708023667336
Validation loss = 0.0008287439122796059
Validation loss = 0.0011603485327214003
Validation loss = 0.0013333415845409036
Validation loss = 0.0010196014773100615
Validation loss = 0.0013010302791371942
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012014333624392748
Validation loss = 0.0009072845568880439
Validation loss = 0.0010199198732152581
Validation loss = 0.0009738777880556881
Validation loss = 0.001112571801058948
Validation loss = 0.0009591087000444531
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010640437249094248
Validation loss = 0.001035858760587871
Validation loss = 0.0007666142773814499
Validation loss = 0.0009493493707850575
Validation loss = 0.000837870582472533
Validation loss = 0.0014110682532191277
Validation loss = 0.0008823323296383023
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010393799748271704
Validation loss = 0.0010769448708742857
Validation loss = 0.000962638936471194
Validation loss = 0.0011597826378419995
Validation loss = 0.0009923380566760898
Validation loss = 0.0010332081001251936
Validation loss = 0.0010750828078016639
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009708117577247322
Validation loss = 0.0010405470384284854
Validation loss = 0.0009194111917167902
Validation loss = 0.001156940357759595
Validation loss = 0.0008114541415125132
Validation loss = 0.001213607145473361
Validation loss = 0.000976724550127983
Validation loss = 0.0010671369964256883
Validation loss = 0.0009877517586573958
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -6.12     |
| Iteration     | 79        |
| MaximumReturn | -0.000583 |
| MinimumReturn | -87.8     |
| TotalSamples  | 134946    |
-----------------------------
