Logging to experiments/invertedPendulum/test-exp-dir/test-exp_seed3214
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.545070230960846
Validation loss = 0.27752962708473206
Validation loss = 0.24451015889644623
Validation loss = 0.2268267273902893
Validation loss = 0.19908654689788818
Validation loss = 0.18794678151607513
Validation loss = 0.18944254517555237
Validation loss = 0.16624465584754944
Validation loss = 0.1681610643863678
Validation loss = 0.15400917828083038
Validation loss = 0.14137588441371918
Validation loss = 0.13866782188415527
Validation loss = 0.1328066885471344
Validation loss = 0.1296851485967636
Validation loss = 0.1377912312746048
Validation loss = 0.1201125830411911
Validation loss = 0.1271725744009018
Validation loss = 0.11931151896715164
Validation loss = 0.11276812851428986
Validation loss = 0.11492164433002472
Validation loss = 0.11306541413068771
Validation loss = 0.09467332065105438
Validation loss = 0.09168916940689087
Validation loss = 0.08103393763303757
Validation loss = 0.12405259162187576
Validation loss = 0.09582231938838959
Validation loss = 0.08464784175157547
Validation loss = 0.08906325697898865
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5582205653190613
Validation loss = 0.28682270646095276
Validation loss = 0.25000670552253723
Validation loss = 0.21816091239452362
Validation loss = 0.19960911571979523
Validation loss = 0.19207148253917694
Validation loss = 0.18457037210464478
Validation loss = 0.17431461811065674
Validation loss = 0.16391371190547943
Validation loss = 0.15816883742809296
Validation loss = 0.1631789207458496
Validation loss = 0.1465204656124115
Validation loss = 0.14886021614074707
Validation loss = 0.16240204870700836
Validation loss = 0.14134271442890167
Validation loss = 0.134528249502182
Validation loss = 0.133148655295372
Validation loss = 0.12514738738536835
Validation loss = 0.12864011526107788
Validation loss = 0.12740154564380646
Validation loss = 0.10875485092401505
Validation loss = 0.11051841825246811
Validation loss = 0.09860531240701675
Validation loss = 0.10178148746490479
Validation loss = 0.09789637476205826
Validation loss = 0.08597398549318314
Validation loss = 0.08521881699562073
Validation loss = 0.0804930031299591
Validation loss = 0.08111543208360672
Validation loss = 0.07813940942287445
Validation loss = 0.07724518328905106
Validation loss = 0.07107582688331604
Validation loss = 0.07976911962032318
Validation loss = 0.07454387843608856
Validation loss = 0.07293693721294403
Validation loss = 0.06773296743631363
Validation loss = 0.07731034606695175
Validation loss = 0.0697111263871193
Validation loss = 0.07551442831754684
Validation loss = 0.07692503184080124
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5405267477035522
Validation loss = 0.2774257957935333
Validation loss = 0.24487455189228058
Validation loss = 0.2218768447637558
Validation loss = 0.1957833468914032
Validation loss = 0.18736211955547333
Validation loss = 0.19242478907108307
Validation loss = 0.16471420228481293
Validation loss = 0.16106343269348145
Validation loss = 0.1563625931739807
Validation loss = 0.145537331700325
Validation loss = 0.14283998310565948
Validation loss = 0.1497306227684021
Validation loss = 0.13194681704044342
Validation loss = 0.14167185127735138
Validation loss = 0.12493214011192322
Validation loss = 0.1265934705734253
Validation loss = 0.11455895751714706
Validation loss = 0.11082489043474197
Validation loss = 0.10441508889198303
Validation loss = 0.1271132379770279
Validation loss = 0.0936124250292778
Validation loss = 0.09956834465265274
Validation loss = 0.09007396548986435
Validation loss = 0.08539605140686035
Validation loss = 0.09392088651657104
Validation loss = 0.0737217366695404
Validation loss = 0.07536369562149048
Validation loss = 0.08937366306781769
Validation loss = 0.07994705438613892
Validation loss = 0.07592250406742096
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5617355108261108
Validation loss = 0.29093775153160095
Validation loss = 0.26644960045814514
Validation loss = 0.21917004883289337
Validation loss = 0.21746890246868134
Validation loss = 0.20117956399917603
Validation loss = 0.18511457741260529
Validation loss = 0.178323432803154
Validation loss = 0.17895467579364777
Validation loss = 0.16435955464839935
Validation loss = 0.1560099571943283
Validation loss = 0.1496679186820984
Validation loss = 0.1445688009262085
Validation loss = 0.13873730599880219
Validation loss = 0.13447362184524536
Validation loss = 0.1388079822063446
Validation loss = 0.14826600253582
Validation loss = 0.13122983276844025
Validation loss = 0.13326439261436462
Validation loss = 0.12986326217651367
Validation loss = 0.10952707380056381
Validation loss = 0.10755471885204315
Validation loss = 0.11095405369997025
Validation loss = 0.10308704525232315
Validation loss = 0.10133562982082367
Validation loss = 0.09300994127988815
Validation loss = 0.09351855516433716
Validation loss = 0.08841729909181595
Validation loss = 0.08187702298164368
Validation loss = 0.08466482162475586
Validation loss = 0.0801548883318901
Validation loss = 0.07450448721647263
Validation loss = 0.07914987951517105
Validation loss = 0.08031824231147766
Validation loss = 0.07644008100032806
Validation loss = 0.07064367085695267
Validation loss = 0.07102152705192566
Validation loss = 0.07052431255578995
Validation loss = 0.07415177673101425
Validation loss = 0.06634961068630219
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5721513628959656
Validation loss = 0.2656064033508301
Validation loss = 0.25076809525489807
Validation loss = 0.22418564558029175
Validation loss = 0.20156745612621307
Validation loss = 0.1858818531036377
Validation loss = 0.18422813713550568
Validation loss = 0.1665477305650711
Validation loss = 0.16726164519786835
Validation loss = 0.1615244299173355
Validation loss = 0.15491807460784912
Validation loss = 0.15061616897583008
Validation loss = 0.14536398649215698
Validation loss = 0.13864892721176147
Validation loss = 0.13883386552333832
Validation loss = 0.12423636764287949
Validation loss = 0.12913499772548676
Validation loss = 0.1281721591949463
Validation loss = 0.11347872763872147
Validation loss = 0.11031470447778702
Validation loss = 0.10774291306734085
Validation loss = 0.10848553478717804
Validation loss = 0.09882146120071411
Validation loss = 0.10265178978443146
Validation loss = 0.10322762280702591
Validation loss = 0.0959831103682518
Validation loss = 0.08638159930706024
Validation loss = 0.08271872252225876
Validation loss = 0.09333246946334839
Validation loss = 0.07215902954339981
Validation loss = 0.07123943418264389
Validation loss = 0.07654917240142822
Validation loss = 0.07952382415533066
Validation loss = 0.07785946130752563
Validation loss = 0.07132234424352646
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.02    |
| Iteration     | 0        |
| MaximumReturn | -0.0108  |
| MinimumReturn | -0.0356  |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2576027512550354
Validation loss = 0.11391665041446686
Validation loss = 0.09505520761013031
Validation loss = 0.08169323951005936
Validation loss = 0.0686909556388855
Validation loss = 0.07830975949764252
Validation loss = 0.0706455186009407
Validation loss = 0.06740447133779526
Validation loss = 0.06055848300457001
Validation loss = 0.06071872636675835
Validation loss = 0.05624496191740036
Validation loss = 0.058719560503959656
Validation loss = 0.06420845538377762
Validation loss = 0.07060776650905609
Validation loss = 0.05586693808436394
Validation loss = 0.067892886698246
Validation loss = 0.061754826456308365
Validation loss = 0.056271325796842575
Validation loss = 0.055453117936849594
Validation loss = 0.06559298187494278
Validation loss = 0.05971096083521843
Validation loss = 0.0624106340110302
Validation loss = 0.054208818823099136
Validation loss = 0.053378086537122726
Validation loss = 0.05735275149345398
Validation loss = 0.05664392188191414
Validation loss = 0.054586201906204224
Validation loss = 0.049212243407964706
Validation loss = 0.050844401121139526
Validation loss = 0.06022116169333458
Validation loss = 0.05644861236214638
Validation loss = 0.05185870826244354
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16623225808143616
Validation loss = 0.11049230396747589
Validation loss = 0.0838581994175911
Validation loss = 0.07569438964128494
Validation loss = 0.07035722583532333
Validation loss = 0.06563448905944824
Validation loss = 0.06299374997615814
Validation loss = 0.061248522251844406
Validation loss = 0.06788447499275208
Validation loss = 0.05697637051343918
Validation loss = 0.060397181659936905
Validation loss = 0.059662844985723495
Validation loss = 0.06512629240751266
Validation loss = 0.05537116900086403
Validation loss = 0.05618508532643318
Validation loss = 0.05389587581157684
Validation loss = 0.06906288117170334
Validation loss = 0.06000421941280365
Validation loss = 0.05334128066897392
Validation loss = 0.05205550789833069
Validation loss = 0.05047708749771118
Validation loss = 0.05025380849838257
Validation loss = 0.05019811540842056
Validation loss = 0.05209625884890556
Validation loss = 0.056677158921957016
Validation loss = 0.052347537130117416
Validation loss = 0.04981476068496704
Validation loss = 0.04828830808401108
Validation loss = 0.04592746123671532
Validation loss = 0.049363136291503906
Validation loss = 0.049410153180360794
Validation loss = 0.052621468901634216
Validation loss = 0.05444316193461418
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2207978069782257
Validation loss = 0.13170874118804932
Validation loss = 0.10052715986967087
Validation loss = 0.0896320641040802
Validation loss = 0.08573810756206512
Validation loss = 0.07397720217704773
Validation loss = 0.07293956726789474
Validation loss = 0.06393478065729141
Validation loss = 0.06101119518280029
Validation loss = 0.06250395625829697
Validation loss = 0.056840457022190094
Validation loss = 0.0652441754937172
Validation loss = 0.0683729350566864
Validation loss = 0.06049429252743721
Validation loss = 0.06148168444633484
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.21592269837856293
Validation loss = 0.1199093833565712
Validation loss = 0.0916503518819809
Validation loss = 0.07729499042034149
Validation loss = 0.07407350838184357
Validation loss = 0.08500061929225922
Validation loss = 0.06482014805078506
Validation loss = 0.06591533869504929
Validation loss = 0.0636761263012886
Validation loss = 0.06253578513860703
Validation loss = 0.05742194131016731
Validation loss = 0.05703544244170189
Validation loss = 0.05480862781405449
Validation loss = 0.05450962856411934
Validation loss = 0.055250465869903564
Validation loss = 0.05826260522007942
Validation loss = 0.05747956782579422
Validation loss = 0.07735764235258102
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2330295741558075
Validation loss = 0.12026602774858475
Validation loss = 0.09432350099086761
Validation loss = 0.08776771277189255
Validation loss = 0.07431963831186295
Validation loss = 0.06848736852407455
Validation loss = 0.07009317725896835
Validation loss = 0.06401377171278
Validation loss = 0.06789801269769669
Validation loss = 0.06741028279066086
Validation loss = 0.05728829279541969
Validation loss = 0.05232470855116844
Validation loss = 0.05396229773759842
Validation loss = 0.054832249879837036
Validation loss = 0.05252118036150932
Validation loss = 0.05192095786333084
Validation loss = 0.05688715726137161
Validation loss = 0.05148068070411682
Validation loss = 0.051271967589855194
Validation loss = 0.052021678537130356
Validation loss = 0.05976836383342743
Validation loss = 0.056645702570676804
Validation loss = 0.057380035519599915
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00161 |
| Iteration     | 1        |
| MaximumReturn | -0.00124 |
| MinimumReturn | -0.00217 |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0657949149608612
Validation loss = 0.05312511324882507
Validation loss = 0.05207546055316925
Validation loss = 0.04191525653004646
Validation loss = 0.05051608011126518
Validation loss = 0.03954128548502922
Validation loss = 0.0398034043610096
Validation loss = 0.03974704444408417
Validation loss = 0.03587321192026138
Validation loss = 0.03712137043476105
Validation loss = 0.03888067230582237
Validation loss = 0.040242016315460205
Validation loss = 0.03662654384970665
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0897451788187027
Validation loss = 0.08072268962860107
Validation loss = 0.06286831200122833
Validation loss = 0.053509995341300964
Validation loss = 0.04729141294956207
Validation loss = 0.04623851180076599
Validation loss = 0.04258858039975166
Validation loss = 0.03767862170934677
Validation loss = 0.03710326924920082
Validation loss = 0.04319027066230774
Validation loss = 0.039177652448415756
Validation loss = 0.04104067385196686
Validation loss = 0.04349183291196823
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08073624968528748
Validation loss = 0.061764538288116455
Validation loss = 0.052819229662418365
Validation loss = 0.05250342935323715
Validation loss = 0.05461704730987549
Validation loss = 0.05388510227203369
Validation loss = 0.05421359837055206
Validation loss = 0.050951987504959106
Validation loss = 0.05729367583990097
Validation loss = 0.04924102872610092
Validation loss = 0.045947328209877014
Validation loss = 0.04768773540854454
Validation loss = 0.04942945018410683
Validation loss = 0.044055648148059845
Validation loss = 0.04370739310979843
Validation loss = 0.049367062747478485
Validation loss = 0.0426049567759037
Validation loss = 0.0427742637693882
Validation loss = 0.04086928069591522
Validation loss = 0.04380226135253906
Validation loss = 0.04240363836288452
Validation loss = 0.03992656245827675
Validation loss = 0.04370685666799545
Validation loss = 0.04238373041152954
Validation loss = 0.043621666729450226
Validation loss = 0.039863765239715576
Validation loss = 0.0420844629406929
Validation loss = 0.039555422961711884
Validation loss = 0.03544211387634277
Validation loss = 0.04002063721418381
Validation loss = 0.04301130026578903
Validation loss = 0.03864913061261177
Validation loss = 0.03521719574928284
Validation loss = 0.0366223081946373
Validation loss = 0.0391402393579483
Validation loss = 0.0433766171336174
Validation loss = 0.03607410565018654
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06989022344350815
Validation loss = 0.05366075038909912
Validation loss = 0.0494081936776638
Validation loss = 0.05116124078631401
Validation loss = 0.049939271062612534
Validation loss = 0.04431924596428871
Validation loss = 0.04457017779350281
Validation loss = 0.047333087772130966
Validation loss = 0.04578322917222977
Validation loss = 0.04494456574320793
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1001807302236557
Validation loss = 0.052477702498435974
Validation loss = 0.047456562519073486
Validation loss = 0.04480694979429245
Validation loss = 0.04208342730998993
Validation loss = 0.04803641140460968
Validation loss = 0.040699560195207596
Validation loss = 0.040071841329336166
Validation loss = 0.040787432342767715
Validation loss = 0.04146706685423851
Validation loss = 0.038673125207424164
Validation loss = 0.04024418815970421
Validation loss = 0.03943142667412758
Validation loss = 0.0395943783223629
Validation loss = 0.04526829719543457
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000799 |
| Iteration     | 2         |
| MaximumReturn | -0.000695 |
| MinimumReturn | -0.000969 |
| TotalSamples  | 6664      |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.043635983020067215
Validation loss = 0.0294425617903471
Validation loss = 0.027332693338394165
Validation loss = 0.027125224471092224
Validation loss = 0.03245389088988304
Validation loss = 0.030956802889704704
Validation loss = 0.028664978221058846
Validation loss = 0.025652917101979256
Validation loss = 0.024650083854794502
Validation loss = 0.0241513904184103
Validation loss = 0.025366144254803658
Validation loss = 0.024281740188598633
Validation loss = 0.024293385446071625
Validation loss = 0.027911929413676262
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03635542094707489
Validation loss = 0.03617769107222557
Validation loss = 0.0308900848031044
Validation loss = 0.030581215396523476
Validation loss = 0.02908133901655674
Validation loss = 0.029246049001812935
Validation loss = 0.0291497353464365
Validation loss = 0.026092492043972015
Validation loss = 0.02682655304670334
Validation loss = 0.028063317760825157
Validation loss = 0.028049806132912636
Validation loss = 0.02501109056174755
Validation loss = 0.02846486121416092
Validation loss = 0.02753177285194397
Validation loss = 0.031082259491086006
Validation loss = 0.03301684185862541
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04106855019927025
Validation loss = 0.032953012734651566
Validation loss = 0.02997841127216816
Validation loss = 0.029664935544133186
Validation loss = 0.026993609964847565
Validation loss = 0.030780352652072906
Validation loss = 0.030304329469799995
Validation loss = 0.030040016397833824
Validation loss = 0.026885690167546272
Validation loss = 0.0261224377900362
Validation loss = 0.02787422575056553
Validation loss = 0.027528339996933937
Validation loss = 0.026066437363624573
Validation loss = 0.03396910801529884
Validation loss = 0.0263201966881752
Validation loss = 0.0307405274361372
Validation loss = 0.022661499679088593
Validation loss = 0.026814231649041176
Validation loss = 0.027198443189263344
Validation loss = 0.022884519770741463
Validation loss = 0.025225333869457245
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04379642382264137
Validation loss = 0.039396826177835464
Validation loss = 0.03671291843056679
Validation loss = 0.03493143990635872
Validation loss = 0.03232401981949806
Validation loss = 0.03182477131485939
Validation loss = 0.035186100751161575
Validation loss = 0.04475131258368492
Validation loss = 0.04006095975637436
Validation loss = 0.03486037999391556
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.050999715924263
Validation loss = 0.03719149902462959
Validation loss = 0.02754257619380951
Validation loss = 0.02876829169690609
Validation loss = 0.02900686115026474
Validation loss = 0.03124636597931385
Validation loss = 0.02597125433385372
Validation loss = 0.03159520775079727
Validation loss = 0.029012715443968773
Validation loss = 0.02613157592713833
Validation loss = 0.02823072113096714
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0011   |
| Iteration     | 3         |
| MaximumReturn | -0.000508 |
| MinimumReturn | -0.00301  |
| TotalSamples  | 8330      |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03164718300104141
Validation loss = 0.014444919303059578
Validation loss = 0.011371176689863205
Validation loss = 0.011012252420186996
Validation loss = 0.011258560232818127
Validation loss = 0.011273371055722237
Validation loss = 0.01119666825979948
Validation loss = 0.011105692014098167
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03661111742258072
Validation loss = 0.015032160095870495
Validation loss = 0.012778114527463913
Validation loss = 0.01254533976316452
Validation loss = 0.012337970547378063
Validation loss = 0.01212202850729227
Validation loss = 0.012836700305342674
Validation loss = 0.012119127437472343
Validation loss = 0.0137006975710392
Validation loss = 0.011915523558855057
Validation loss = 0.011261029168963432
Validation loss = 0.011928579770028591
Validation loss = 0.011362350545823574
Validation loss = 0.011274975724518299
Validation loss = 0.011195594444870949
Validation loss = 0.01126930397003889
Validation loss = 0.011035075411200523
Validation loss = 0.012501946650445461
Validation loss = 0.011239083483815193
Validation loss = 0.010955406352877617
Validation loss = 0.01040311623364687
Validation loss = 0.011528403498232365
Validation loss = 0.011129208840429783
Validation loss = 0.01190662570297718
Validation loss = 0.012253456749022007
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03227671608328819
Validation loss = 0.022032536566257477
Validation loss = 0.016785111278295517
Validation loss = 0.012578414753079414
Validation loss = 0.014163048006594181
Validation loss = 0.01254876609891653
Validation loss = 0.010639403946697712
Validation loss = 0.01057205069810152
Validation loss = 0.011233082041144371
Validation loss = 0.010447611100971699
Validation loss = 0.010457190684974194
Validation loss = 0.009997176937758923
Validation loss = 0.009630979970097542
Validation loss = 0.010280598886311054
Validation loss = 0.010459799319505692
Validation loss = 0.009375745430588722
Validation loss = 0.010096103884279728
Validation loss = 0.0102249039337039
Validation loss = 0.010074025951325893
Validation loss = 0.012006503529846668
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.040263161063194275
Validation loss = 0.015622743405401707
Validation loss = 0.014719784259796143
Validation loss = 0.015022515319287777
Validation loss = 0.014409959316253662
Validation loss = 0.015008337795734406
Validation loss = 0.014498426578938961
Validation loss = 0.014456619508564472
Validation loss = 0.013392209075391293
Validation loss = 0.013602472841739655
Validation loss = 0.015395870432257652
Validation loss = 0.015447950921952724
Validation loss = 0.013438391499221325
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.030716847628355026
Validation loss = 0.017779052257537842
Validation loss = 0.014062630943953991
Validation loss = 0.012955586425960064
Validation loss = 0.012374743819236755
Validation loss = 0.012363421730697155
Validation loss = 0.012670032680034637
Validation loss = 0.011503158137202263
Validation loss = 0.012495957314968109
Validation loss = 0.015078626573085785
Validation loss = 0.011807436123490334
Validation loss = 0.01278117299079895
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00169 |
| Iteration     | 4        |
| MaximumReturn | -0.00071 |
| MinimumReturn | -0.00859 |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01761614717543125
Validation loss = 0.009371127001941204
Validation loss = 0.009829544462263584
Validation loss = 0.009032981470227242
Validation loss = 0.010666925460100174
Validation loss = 0.010774029418826103
Validation loss = 0.008978665806353092
Validation loss = 0.009240750223398209
Validation loss = 0.008797047659754753
Validation loss = 0.009238770231604576
Validation loss = 0.009071903303265572
Validation loss = 0.01178711373358965
Validation loss = 0.009375855326652527
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01573328673839569
Validation loss = 0.00920629408210516
Validation loss = 0.008924329653382301
Validation loss = 0.011168451979756355
Validation loss = 0.00975447241216898
Validation loss = 0.009474349208176136
Validation loss = 0.009000997059047222
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013798808678984642
Validation loss = 0.008879998698830605
Validation loss = 0.008072515949606895
Validation loss = 0.007678783033043146
Validation loss = 0.007264662533998489
Validation loss = 0.008318878710269928
Validation loss = 0.00757342716678977
Validation loss = 0.008620968088507652
Validation loss = 0.007553101517260075
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.016352837905287743
Validation loss = 0.012989571318030357
Validation loss = 0.01435776986181736
Validation loss = 0.012368937954306602
Validation loss = 0.010455605573952198
Validation loss = 0.01093942392617464
Validation loss = 0.010756426490843296
Validation loss = 0.010121227242052555
Validation loss = 0.010593299753963947
Validation loss = 0.011660792864859104
Validation loss = 0.011045419611036777
Validation loss = 0.010542303323745728
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015947120264172554
Validation loss = 0.008923620916903019
Validation loss = 0.008839850313961506
Validation loss = 0.008869114331901073
Validation loss = 0.009248181246221066
Validation loss = 0.00901287142187357
Validation loss = 0.012864643707871437
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0013   |
| Iteration     | 5         |
| MaximumReturn | -0.000639 |
| MinimumReturn | -0.00729  |
| TotalSamples  | 11662     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01577947661280632
Validation loss = 0.012144381180405617
Validation loss = 0.01019140426069498
Validation loss = 0.011144832707941532
Validation loss = 0.009601839818060398
Validation loss = 0.00968442764133215
Validation loss = 0.013433115556836128
Validation loss = 0.01312969159334898
Validation loss = 0.011687996797263622
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012846151366829872
Validation loss = 0.011593376286327839
Validation loss = 0.011302202008664608
Validation loss = 0.017311349511146545
Validation loss = 0.012281163595616817
Validation loss = 0.010034923441708088
Validation loss = 0.01315725315362215
Validation loss = 0.01077442429959774
Validation loss = 0.011235786601901054
Validation loss = 0.010268235579133034
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012763412669301033
Validation loss = 0.01309178490191698
Validation loss = 0.009126095101237297
Validation loss = 0.009046988561749458
Validation loss = 0.011787684634327888
Validation loss = 0.013504100032150745
Validation loss = 0.009068724699318409
Validation loss = 0.008819795213639736
Validation loss = 0.01038601715117693
Validation loss = 0.00840597040951252
Validation loss = 0.009676515124738216
Validation loss = 0.011761577799916267
Validation loss = 0.00979556329548359
Validation loss = 0.010810472071170807
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013852770440280437
Validation loss = 0.013452507555484772
Validation loss = 0.01219051145017147
Validation loss = 0.012212995439767838
Validation loss = 0.013308800756931305
Validation loss = 0.013462292961776257
Validation loss = 0.01310083270072937
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020249728113412857
Validation loss = 0.012910192832350731
Validation loss = 0.010596020147204399
Validation loss = 0.009977299720048904
Validation loss = 0.011097734794020653
Validation loss = 0.012797917239367962
Validation loss = 0.011118917725980282
Validation loss = 0.01048771757632494
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00446  |
| Iteration     | 6         |
| MaximumReturn | -0.000578 |
| MinimumReturn | -0.0231   |
| TotalSamples  | 13328     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01768920198082924
Validation loss = 0.010742624290287495
Validation loss = 0.007473097648471594
Validation loss = 0.01132822036743164
Validation loss = 0.008833144791424274
Validation loss = 0.010397662408649921
Validation loss = 0.009539788588881493
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012578445486724377
Validation loss = 0.012088737450540066
Validation loss = 0.008040053769946098
Validation loss = 0.008426127023994923
Validation loss = 0.008295503444969654
Validation loss = 0.007740330416709185
Validation loss = 0.007786934729665518
Validation loss = 0.009426230564713478
Validation loss = 0.007448872085660696
Validation loss = 0.007079432252794504
Validation loss = 0.011549226008355618
Validation loss = 0.010085019282996655
Validation loss = 0.007975930348038673
Validation loss = 0.008329382166266441
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01596585288643837
Validation loss = 0.007599208038300276
Validation loss = 0.007669089362025261
Validation loss = 0.008501497097313404
Validation loss = 0.009385627694427967
Validation loss = 0.007946593686938286
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009951051324605942
Validation loss = 0.009467583149671555
Validation loss = 0.009260467253625393
Validation loss = 0.013122250325977802
Validation loss = 0.008592489175498486
Validation loss = 0.009651875123381615
Validation loss = 0.011359396390616894
Validation loss = 0.010541897267103195
Validation loss = 0.01095518097281456
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013098699040710926
Validation loss = 0.010050104930996895
Validation loss = 0.011141464114189148
Validation loss = 0.009731157682836056
Validation loss = 0.008554683066904545
Validation loss = 0.008574321866035461
Validation loss = 0.008170285262167454
Validation loss = 0.007374615874141455
Validation loss = 0.008067105896770954
Validation loss = 0.009272611699998379
Validation loss = 0.00934684183448553
Validation loss = 0.008822092786431313
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00488  |
| Iteration     | 7         |
| MaximumReturn | -0.000641 |
| MinimumReturn | -0.0211   |
| TotalSamples  | 14994     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014818216674029827
Validation loss = 0.006302270106971264
Validation loss = 0.006400131154805422
Validation loss = 0.007096514571458101
Validation loss = 0.005791653413325548
Validation loss = 0.00801887921988964
Validation loss = 0.007326588500291109
Validation loss = 0.006634578574448824
Validation loss = 0.006904506590217352
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01557080913335085
Validation loss = 0.007993615232408047
Validation loss = 0.006543890573084354
Validation loss = 0.006628064904361963
Validation loss = 0.005700741894543171
Validation loss = 0.005702634807676077
Validation loss = 0.005734719336032867
Validation loss = 0.011611035093665123
Validation loss = 0.008238708600401878
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013477523811161518
Validation loss = 0.007004891522228718
Validation loss = 0.006026750896126032
Validation loss = 0.006395353935658932
Validation loss = 0.0056203375570476055
Validation loss = 0.009571099653840065
Validation loss = 0.007308696862310171
Validation loss = 0.0073804305866360664
Validation loss = 0.005748603958636522
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013733350671827793
Validation loss = 0.008028430864214897
Validation loss = 0.006996277719736099
Validation loss = 0.007321301382035017
Validation loss = 0.006900652311742306
Validation loss = 0.008225386962294579
Validation loss = 0.007637717295438051
Validation loss = 0.006917743943631649
Validation loss = 0.006859090179204941
Validation loss = 0.00752628967165947
Validation loss = 0.007796092890202999
Validation loss = 0.006920966785401106
Validation loss = 0.006342089734971523
Validation loss = 0.0073094600811600685
Validation loss = 0.008285627700388432
Validation loss = 0.008511324413120747
Validation loss = 0.006360815372318029
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012852412648499012
Validation loss = 0.008304813876748085
Validation loss = 0.007318039890378714
Validation loss = 0.00849998276680708
Validation loss = 0.007430442608892918
Validation loss = 0.007966854609549046
Validation loss = 0.008645794354379177
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00275  |
| Iteration     | 8         |
| MaximumReturn | -0.000556 |
| MinimumReturn | -0.0116   |
| TotalSamples  | 16660     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007681576535105705
Validation loss = 0.0076969764195382595
Validation loss = 0.006544299889355898
Validation loss = 0.005852680653333664
Validation loss = 0.006730457302182913
Validation loss = 0.007302931044250727
Validation loss = 0.011479109525680542
Validation loss = 0.006122487597167492
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006534588988870382
Validation loss = 0.005896562710404396
Validation loss = 0.006663304753601551
Validation loss = 0.0064631178975105286
Validation loss = 0.006406446918845177
Validation loss = 0.005987541750073433
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005635788664221764
Validation loss = 0.006305322982370853
Validation loss = 0.006436164025217295
Validation loss = 0.007044326048344374
Validation loss = 0.007677287794649601
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007977151311933994
Validation loss = 0.007827521301805973
Validation loss = 0.006607107352465391
Validation loss = 0.01063494011759758
Validation loss = 0.0065686325542628765
Validation loss = 0.007526691071689129
Validation loss = 0.006953068543225527
Validation loss = 0.005689363460987806
Validation loss = 0.005650111474096775
Validation loss = 0.005864911712706089
Validation loss = 0.005742526613175869
Validation loss = 0.007553985808044672
Validation loss = 0.005833696573972702
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011254432611167431
Validation loss = 0.0063993376679718494
Validation loss = 0.0069680181331932545
Validation loss = 0.007858612574636936
Validation loss = 0.009072907269001007
Validation loss = 0.007685292977839708
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00326  |
| Iteration     | 9         |
| MaximumReturn | -0.000605 |
| MinimumReturn | -0.0141   |
| TotalSamples  | 18326     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010016628541052341
Validation loss = 0.007904047146439552
Validation loss = 0.005452299956232309
Validation loss = 0.005730969365686178
Validation loss = 0.008183491416275501
Validation loss = 0.005905294790863991
Validation loss = 0.006389989983290434
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010536236688494682
Validation loss = 0.005059594288468361
Validation loss = 0.005464509129524231
Validation loss = 0.007728769443929195
Validation loss = 0.006706490647047758
Validation loss = 0.008952813223004341
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00721357436850667
Validation loss = 0.007539775688201189
Validation loss = 0.006596406456083059
Validation loss = 0.005465980619192123
Validation loss = 0.00536397285759449
Validation loss = 0.004968905821442604
Validation loss = 0.008049595169723034
Validation loss = 0.004891488701105118
Validation loss = 0.004827403929084539
Validation loss = 0.004686866886913776
Validation loss = 0.005746437702327967
Validation loss = 0.004549888893961906
Validation loss = 0.005605374462902546
Validation loss = 0.015812987461686134
Validation loss = 0.00476029422134161
Validation loss = 0.005746126174926758
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008643473498523235
Validation loss = 0.008096547797322273
Validation loss = 0.00860103964805603
Validation loss = 0.006053663790225983
Validation loss = 0.00674476008862257
Validation loss = 0.00583423487842083
Validation loss = 0.006205055397003889
Validation loss = 0.0069993287324905396
Validation loss = 0.004719413351267576
Validation loss = 0.0067380755208432674
Validation loss = 0.0060103898867964745
Validation loss = 0.004604530520737171
Validation loss = 0.004670610185712576
Validation loss = 0.005504982080310583
Validation loss = 0.0054541705176234245
Validation loss = 0.005338366609066725
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008738111704587936
Validation loss = 0.00672116270288825
Validation loss = 0.005772402044385672
Validation loss = 0.0060114008374512196
Validation loss = 0.006996508222073317
Validation loss = 0.0055670165456831455
Validation loss = 0.010256122797727585
Validation loss = 0.006658842787146568
Validation loss = 0.00939555000513792
Validation loss = 0.005681027192622423
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0531   |
| Iteration     | 10        |
| MaximumReturn | -0.000572 |
| MinimumReturn | -1.16     |
| TotalSamples  | 19992     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015961872413754463
Validation loss = 0.008564196527004242
Validation loss = 0.0064608631655573845
Validation loss = 0.006481628865003586
Validation loss = 0.006104957312345505
Validation loss = 0.006097068078815937
Validation loss = 0.004877191968262196
Validation loss = 0.0063043213449418545
Validation loss = 0.008642486296594143
Validation loss = 0.005623873323202133
Validation loss = 0.004334558732807636
Validation loss = 0.005574790760874748
Validation loss = 0.004795297980308533
Validation loss = 0.005500990431755781
Validation loss = 0.004602421540766954
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010082520544528961
Validation loss = 0.00874737836420536
Validation loss = 0.00490969605743885
Validation loss = 0.005788064561784267
Validation loss = 0.0043272036127746105
Validation loss = 0.0065000662580132484
Validation loss = 0.005201614461839199
Validation loss = 0.005275469273328781
Validation loss = 0.004202965181320906
Validation loss = 0.005706914700567722
Validation loss = 0.006451004650443792
Validation loss = 0.0047117541544139385
Validation loss = 0.005873849615454674
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012885337695479393
Validation loss = 0.007035445421934128
Validation loss = 0.00613566255196929
Validation loss = 0.004598586820065975
Validation loss = 0.004511701408773661
Validation loss = 0.005026467144489288
Validation loss = 0.004868707619607449
Validation loss = 0.005433849990367889
Validation loss = 0.007290647830814123
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010224109515547752
Validation loss = 0.006229420192539692
Validation loss = 0.011952453292906284
Validation loss = 0.005575402639806271
Validation loss = 0.006103510968387127
Validation loss = 0.008226489648222923
Validation loss = 0.006857878062874079
Validation loss = 0.005937323439866304
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010629991069436073
Validation loss = 0.00798751600086689
Validation loss = 0.008082488551735878
Validation loss = 0.007337505463510752
Validation loss = 0.007114998996257782
Validation loss = 0.006026095245033503
Validation loss = 0.005678833462297916
Validation loss = 0.00863186176866293
Validation loss = 0.004840116016566753
Validation loss = 0.007142791990190744
Validation loss = 0.005533530842512846
Validation loss = 0.0057409596629440784
Validation loss = 0.007444937713444233
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0463  |
| Iteration     | 11       |
| MaximumReturn | -0.00358 |
| MinimumReturn | -0.123   |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008389019407331944
Validation loss = 0.0058741820976138115
Validation loss = 0.005353234708309174
Validation loss = 0.004954494070261717
Validation loss = 0.005123462527990341
Validation loss = 0.005963021889328957
Validation loss = 0.005496487952768803
Validation loss = 0.0053520784713327885
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006911478936672211
Validation loss = 0.004463712684810162
Validation loss = 0.004551836755126715
Validation loss = 0.004884669557213783
Validation loss = 0.005055914632976055
Validation loss = 0.004530312027782202
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007991516962647438
Validation loss = 0.005565819330513477
Validation loss = 0.00516474200412631
Validation loss = 0.006952925119549036
Validation loss = 0.005429045297205448
Validation loss = 0.004581276327371597
Validation loss = 0.0056604077108204365
Validation loss = 0.005256093107163906
Validation loss = 0.005150997079908848
Validation loss = 0.0046684229746460915
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007411745842546225
Validation loss = 0.005549776367843151
Validation loss = 0.00549299968406558
Validation loss = 0.0053989337757229805
Validation loss = 0.004326996393501759
Validation loss = 0.006023854948580265
Validation loss = 0.004941132850944996
Validation loss = 0.004817116539925337
Validation loss = 0.007170385681092739
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01101553812623024
Validation loss = 0.0048943813890218735
Validation loss = 0.006600613705813885
Validation loss = 0.0061530182138085365
Validation loss = 0.005156018305569887
Validation loss = 0.005055188201367855
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00836  |
| Iteration     | 12        |
| MaximumReturn | -0.000886 |
| MinimumReturn | -0.0303   |
| TotalSamples  | 23324     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00675478158518672
Validation loss = 0.005853261332958937
Validation loss = 0.006496846675872803
Validation loss = 0.00488140108063817
Validation loss = 0.005420471541583538
Validation loss = 0.0054788896813988686
Validation loss = 0.00597787369042635
Validation loss = 0.005570085719227791
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009310872294008732
Validation loss = 0.004912329837679863
Validation loss = 0.004663198720663786
Validation loss = 0.005259813275188208
Validation loss = 0.004504617769271135
Validation loss = 0.0037420354783535004
Validation loss = 0.004102456849068403
Validation loss = 0.004715642426162958
Validation loss = 0.004381954204291105
Validation loss = 0.0034524835646152496
Validation loss = 0.0037736373487859964
Validation loss = 0.005442196503281593
Validation loss = 0.005552911665290594
Validation loss = 0.004213421139866114
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005491963587701321
Validation loss = 0.005468278657644987
Validation loss = 0.006881794426590204
Validation loss = 0.005663468036800623
Validation loss = 0.007645623758435249
Validation loss = 0.007641063537448645
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007447275333106518
Validation loss = 0.005869422573596239
Validation loss = 0.004401879385113716
Validation loss = 0.006281984969973564
Validation loss = 0.005892735905945301
Validation loss = 0.00571947218850255
Validation loss = 0.004341475665569305
Validation loss = 0.004436768125742674
Validation loss = 0.0055250502191483974
Validation loss = 0.005913859698921442
Validation loss = 0.004441156517714262
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0051477146334946156
Validation loss = 0.004701366648077965
Validation loss = 0.006239591632038355
Validation loss = 0.005436554551124573
Validation loss = 0.004874785430729389
Validation loss = 0.0056293620727956295
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0117  |
| Iteration     | 13       |
| MaximumReturn | -0.00104 |
| MinimumReturn | -0.0702  |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006902249995619059
Validation loss = 0.005811688024550676
Validation loss = 0.007814238779246807
Validation loss = 0.0043639508076012135
Validation loss = 0.00451393099501729
Validation loss = 0.006064942106604576
Validation loss = 0.0056785778142511845
Validation loss = 0.004670428112149239
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005583152174949646
Validation loss = 0.005133287515491247
Validation loss = 0.006827278062701225
Validation loss = 0.003776158206164837
Validation loss = 0.0038958897348493338
Validation loss = 0.003167848102748394
Validation loss = 0.0037203908432275057
Validation loss = 0.005277620628476143
Validation loss = 0.0053964643739163876
Validation loss = 0.004029553849250078
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006382280495017767
Validation loss = 0.005100152920931578
Validation loss = 0.006847562734037638
Validation loss = 0.004361496772617102
Validation loss = 0.00434538209810853
Validation loss = 0.005425911396741867
Validation loss = 0.006335515063256025
Validation loss = 0.0049363551661372185
Validation loss = 0.004423289094120264
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004978697746992111
Validation loss = 0.005204798188060522
Validation loss = 0.004429200664162636
Validation loss = 0.004777007270604372
Validation loss = 0.005717096850275993
Validation loss = 0.004733748268336058
Validation loss = 0.006583144888281822
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005163471680134535
Validation loss = 0.004837806336581707
Validation loss = 0.007249713409692049
Validation loss = 0.00540916109457612
Validation loss = 0.005145461298525333
Validation loss = 0.004375968128442764
Validation loss = 0.004925586748868227
Validation loss = 0.005764906760305166
Validation loss = 0.005954008083790541
Validation loss = 0.004824700299650431
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.229   |
| Iteration     | 14       |
| MaximumReturn | -0.0208  |
| MinimumReturn | -0.494   |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006059503182768822
Validation loss = 0.00391050148755312
Validation loss = 0.00412300368770957
Validation loss = 0.003925794269889593
Validation loss = 0.003956637345254421
Validation loss = 0.004602018278092146
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005990324076265097
Validation loss = 0.0029500701930373907
Validation loss = 0.00397249823436141
Validation loss = 0.005171383265405893
Validation loss = 0.0037624218966811895
Validation loss = 0.003030831227079034
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003998016472905874
Validation loss = 0.0037964479997754097
Validation loss = 0.0037032305262982845
Validation loss = 0.003971123602241278
Validation loss = 0.0052918377332389355
Validation loss = 0.003644041484221816
Validation loss = 0.0035575604997575283
Validation loss = 0.005062646232545376
Validation loss = 0.005263581871986389
Validation loss = 0.0037881035823374987
Validation loss = 0.004298880230635405
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01021185889840126
Validation loss = 0.004044980742037296
Validation loss = 0.00391258904710412
Validation loss = 0.003456885926425457
Validation loss = 0.005173986777663231
Validation loss = 0.003379537258297205
Validation loss = 0.0035619495902210474
Validation loss = 0.002949521178379655
Validation loss = 0.003906911239027977
Validation loss = 0.0035896783228963614
Validation loss = 0.004711147863417864
Validation loss = 0.0035803834907710552
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007524735294282436
Validation loss = 0.005154027603566647
Validation loss = 0.0051897261291742325
Validation loss = 0.004622564651072025
Validation loss = 0.0045181335881352425
Validation loss = 0.003821281949058175
Validation loss = 0.004864841233938932
Validation loss = 0.005861073732376099
Validation loss = 0.0039576576091349125
Validation loss = 0.004151231609284878
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0438  |
| Iteration     | 15       |
| MaximumReturn | -0.00184 |
| MinimumReturn | -0.194   |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014176136814057827
Validation loss = 0.004976225551217794
Validation loss = 0.004044129978865385
Validation loss = 0.003744683926925063
Validation loss = 0.003895496716722846
Validation loss = 0.0038151114713400602
Validation loss = 0.0037733770441263914
Validation loss = 0.0044293515384197235
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00945967622101307
Validation loss = 0.004034124780446291
Validation loss = 0.004593766760081053
Validation loss = 0.0056007495149970055
Validation loss = 0.0031715307850390673
Validation loss = 0.004634848330169916
Validation loss = 0.0031997656915336847
Validation loss = 0.0037368887569755316
Validation loss = 0.0030723121017217636
Validation loss = 0.003596601542085409
Validation loss = 0.004601638298481703
Validation loss = 0.005090436898171902
Validation loss = 0.0035716837737709284
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009692707099020481
Validation loss = 0.0038171005435287952
Validation loss = 0.003192555857822299
Validation loss = 0.0038972164038568735
Validation loss = 0.0036198091693222523
Validation loss = 0.004067358560860157
Validation loss = 0.00449713971465826
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004683874547481537
Validation loss = 0.004952377639710903
Validation loss = 0.0034606941044330597
Validation loss = 0.003571718465536833
Validation loss = 0.0035693615209311247
Validation loss = 0.005816182587295771
Validation loss = 0.0036488971672952175
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003719229716807604
Validation loss = 0.004782762378454208
Validation loss = 0.00529106380417943
Validation loss = 0.006825227290391922
Validation loss = 0.0037912267725914717
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.981    |
| Iteration     | 16        |
| MaximumReturn | -0.000556 |
| MinimumReturn | -13.3     |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006663256790488958
Validation loss = 0.004375043325126171
Validation loss = 0.004113900475203991
Validation loss = 0.004598197992891073
Validation loss = 0.003706602146849036
Validation loss = 0.003599022515118122
Validation loss = 0.006426422391086817
Validation loss = 0.003368003061041236
Validation loss = 0.004675976932048798
Validation loss = 0.0036811018362641335
Validation loss = 0.004970615729689598
Validation loss = 0.004289988894015551
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004005672875791788
Validation loss = 0.005308233667165041
Validation loss = 0.00463648047298193
Validation loss = 0.003492797492071986
Validation loss = 0.003451881231740117
Validation loss = 0.0031070278491824865
Validation loss = 0.0040303501300513744
Validation loss = 0.004452034831047058
Validation loss = 0.0034005946945399046
Validation loss = 0.0032525709830224514
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005493191070854664
Validation loss = 0.004191035404801369
Validation loss = 0.003965507727116346
Validation loss = 0.0066718109883368015
Validation loss = 0.005209736526012421
Validation loss = 0.004694422706961632
Validation loss = 0.004771372769027948
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00381747935898602
Validation loss = 0.0035470754373818636
Validation loss = 0.004525685217231512
Validation loss = 0.0038804959040135145
Validation loss = 0.0035867367405444384
Validation loss = 0.0045132264494895935
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005401973146945238
Validation loss = 0.00622412096709013
Validation loss = 0.0041404166258871555
Validation loss = 0.004446771927177906
Validation loss = 0.003814344760030508
Validation loss = 0.004781938157975674
Validation loss = 0.004608733579516411
Validation loss = 0.004163268953561783
Validation loss = 0.004175898153334856
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.8    |
| Iteration     | 17       |
| MaximumReturn | -0.0121  |
| MinimumReturn | -54.8    |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004621550440788269
Validation loss = 0.0029623385053128004
Validation loss = 0.0030054403468966484
Validation loss = 0.0027572023682296276
Validation loss = 0.0027210351545363665
Validation loss = 0.0023034776095300913
Validation loss = 0.0024508433416485786
Validation loss = 0.00257198722101748
Validation loss = 0.0026907408609986305
Validation loss = 0.0034672250039875507
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005419753026217222
Validation loss = 0.003909558989107609
Validation loss = 0.004160755313932896
Validation loss = 0.0026163309812545776
Validation loss = 0.002987327752634883
Validation loss = 0.002262290334329009
Validation loss = 0.0029505975544452667
Validation loss = 0.003602834651246667
Validation loss = 0.0020556726958602667
Validation loss = 0.00358163146302104
Validation loss = 0.0032199728302657604
Validation loss = 0.0025578236673027277
Validation loss = 0.0021946618799120188
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00448658037930727
Validation loss = 0.0031455408316105604
Validation loss = 0.0034592768643051386
Validation loss = 0.0036249251570552588
Validation loss = 0.0031081722117960453
Validation loss = 0.0035630231723189354
Validation loss = 0.00300941476598382
Validation loss = 0.0021693981252610683
Validation loss = 0.002393078990280628
Validation loss = 0.0028541209176182747
Validation loss = 0.0029354621656239033
Validation loss = 0.0028899763710796833
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003374980529770255
Validation loss = 0.003132661571726203
Validation loss = 0.002873671241104603
Validation loss = 0.0026960258837789297
Validation loss = 0.0033161852043122053
Validation loss = 0.0028353258967399597
Validation loss = 0.0026845396496355534
Validation loss = 0.003570002503693104
Validation loss = 0.0036104819737374783
Validation loss = 0.00582808256149292
Validation loss = 0.0031147643458098173
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0054723164066672325
Validation loss = 0.004330596886575222
Validation loss = 0.003002633573487401
Validation loss = 0.005208516959100962
Validation loss = 0.003982000518590212
Validation loss = 0.0025067946407943964
Validation loss = 0.0030080147553235292
Validation loss = 0.0028616522904485464
Validation loss = 0.0028311728965491056
Validation loss = 0.0028138691559433937
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0886  |
| Iteration     | 18       |
| MaximumReturn | -0.00796 |
| MinimumReturn | -1.45    |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005295953713357449
Validation loss = 0.004317214246839285
Validation loss = 0.003330392763018608
Validation loss = 0.003807445988059044
Validation loss = 0.002756389556452632
Validation loss = 0.005619686096906662
Validation loss = 0.0030828220769762993
Validation loss = 0.002671654801815748
Validation loss = 0.005863955244421959
Validation loss = 0.003353703301399946
Validation loss = 0.0028705247677862644
Validation loss = 0.003494933247566223
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006016372703015804
Validation loss = 0.003121841000393033
Validation loss = 0.004443825222551823
Validation loss = 0.0027062988374382257
Validation loss = 0.0035370574332773685
Validation loss = 0.0027038808912038803
Validation loss = 0.0025836576241999865
Validation loss = 0.002542844507843256
Validation loss = 0.004589727148413658
Validation loss = 0.002889098599553108
Validation loss = 0.0022621385287493467
Validation loss = 0.003833952359855175
Validation loss = 0.0022968582343310118
Validation loss = 0.0024772342294454575
Validation loss = 0.004205622710287571
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00436588516458869
Validation loss = 0.0046261977404356
Validation loss = 0.006025292444974184
Validation loss = 0.0030609541572630405
Validation loss = 0.0034704040735960007
Validation loss = 0.0035781452897936106
Validation loss = 0.0036632660776376724
Validation loss = 0.004006359726190567
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004529022611677647
Validation loss = 0.004050745628774166
Validation loss = 0.003643530420958996
Validation loss = 0.003241401631385088
Validation loss = 0.0031968671828508377
Validation loss = 0.0030140476301312447
Validation loss = 0.0026427689008414745
Validation loss = 0.003220696235075593
Validation loss = 0.003191834781318903
Validation loss = 0.003161350730806589
Validation loss = 0.0027078858111053705
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004964831285178661
Validation loss = 0.005464982707053423
Validation loss = 0.002992881927639246
Validation loss = 0.0033124370966106653
Validation loss = 0.004909409210085869
Validation loss = 0.0031341947615146637
Validation loss = 0.0026504606939852238
Validation loss = 0.002718658186495304
Validation loss = 0.004752641543745995
Validation loss = 0.002931063063442707
Validation loss = 0.003422674722969532
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36.2    |
| Iteration     | 19       |
| MaximumReturn | -0.0205  |
| MinimumReturn | -77.9    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0074220807291567326
Validation loss = 0.00477400841191411
Validation loss = 0.0037472485564649105
Validation loss = 0.004061829298734665
Validation loss = 0.003857789561152458
Validation loss = 0.0031553718727082014
Validation loss = 0.003354365238919854
Validation loss = 0.003992482088506222
Validation loss = 0.0034259408712387085
Validation loss = 0.0033515288960188627
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010605101473629475
Validation loss = 0.004702917765825987
Validation loss = 0.0034953299909830093
Validation loss = 0.0036876872181892395
Validation loss = 0.003969315439462662
Validation loss = 0.0034199838992208242
Validation loss = 0.00319023453630507
Validation loss = 0.0031792903319001198
Validation loss = 0.002769181504845619
Validation loss = 0.0030467102769762278
Validation loss = 0.0027867306489497423
Validation loss = 0.002304068300873041
Validation loss = 0.002766178920865059
Validation loss = 0.00392680149525404
Validation loss = 0.002977001713588834
Validation loss = 0.0040550557896494865
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008056944236159325
Validation loss = 0.004138212651014328
Validation loss = 0.003360991133376956
Validation loss = 0.002980237826704979
Validation loss = 0.004028297495096922
Validation loss = 0.0033555864356458187
Validation loss = 0.003375525353476405
Validation loss = 0.003418697975575924
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01079398114234209
Validation loss = 0.004601757973432541
Validation loss = 0.0049359542317688465
Validation loss = 0.003310861997306347
Validation loss = 0.003511827439069748
Validation loss = 0.0031815252732485533
Validation loss = 0.003973296377807856
Validation loss = 0.003140695858746767
Validation loss = 0.006115320138633251
Validation loss = 0.0027474856469780207
Validation loss = 0.003242925740778446
Validation loss = 0.002991868183016777
Validation loss = 0.0026921394746750593
Validation loss = 0.0032539127860218287
Validation loss = 0.0030896172393113375
Validation loss = 0.002632608637213707
Validation loss = 0.0027539886068552732
Validation loss = 0.0033993804827332497
Validation loss = 0.0024651011917740107
Validation loss = 0.0028174291364848614
Validation loss = 0.003323145443573594
Validation loss = 0.0023401568178087473
Validation loss = 0.0031260852701961994
Validation loss = 0.0026003937236964703
Validation loss = 0.003114305203780532
Validation loss = 0.0027840081602334976
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0099679846316576
Validation loss = 0.004351508338004351
Validation loss = 0.003399578621610999
Validation loss = 0.004038401413708925
Validation loss = 0.004136411473155022
Validation loss = 0.003212992800399661
Validation loss = 0.003252398455515504
Validation loss = 0.003362318966537714
Validation loss = 0.0025981201324611902
Validation loss = 0.003165275789797306
Validation loss = 0.003530106507241726
Validation loss = 0.004127135965973139
Validation loss = 0.00523723941296339
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -81.8    |
| Iteration     | 20       |
| MaximumReturn | -34.4    |
| MinimumReturn | -128     |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0069917854852974415
Validation loss = 0.003303313860669732
Validation loss = 0.0025523859076201916
Validation loss = 0.0031884461641311646
Validation loss = 0.0022350167855620384
Validation loss = 0.002195717766880989
Validation loss = 0.002451193518936634
Validation loss = 0.0029124943539500237
Validation loss = 0.0018060909351333976
Validation loss = 0.0024302098900079727
Validation loss = 0.0024141641333699226
Validation loss = 0.001672134967520833
Validation loss = 0.002116621471941471
Validation loss = 0.001743460656143725
Validation loss = 0.0038801650516688824
Validation loss = 0.0016554207541048527
Validation loss = 0.00375740067102015
Validation loss = 0.0032377231400460005
Validation loss = 0.0030085183680057526
Validation loss = 0.0016359693836420774
Validation loss = 0.0029123546555638313
Validation loss = 0.0025122202932834625
Validation loss = 0.002002144232392311
Validation loss = 0.0017991316271945834
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007817472331225872
Validation loss = 0.002153904177248478
Validation loss = 0.0017153540393337607
Validation loss = 0.002178200287744403
Validation loss = 0.0022154583130031824
Validation loss = 0.001998641761019826
Validation loss = 0.0016412673285230994
Validation loss = 0.0016082047950476408
Validation loss = 0.001896968693472445
Validation loss = 0.002449311316013336
Validation loss = 0.0017796092433854938
Validation loss = 0.002005139598622918
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0076933628879487514
Validation loss = 0.003703658003360033
Validation loss = 0.0024039573036134243
Validation loss = 0.002669918118044734
Validation loss = 0.00316948932595551
Validation loss = 0.001828481792472303
Validation loss = 0.002412749221548438
Validation loss = 0.0018891985528171062
Validation loss = 0.0017503048293292522
Validation loss = 0.002409631386399269
Validation loss = 0.0019295215606689453
Validation loss = 0.0032711587846279144
Validation loss = 0.0031734146177768707
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008097359910607338
Validation loss = 0.002192446030676365
Validation loss = 0.0019437088631093502
Validation loss = 0.002103472827002406
Validation loss = 0.0014942233683541417
Validation loss = 0.0016160216182470322
Validation loss = 0.0018612249987199903
Validation loss = 0.0018791601760312915
Validation loss = 0.00128990039229393
Validation loss = 0.0014716058503836393
Validation loss = 0.0022527994588017464
Validation loss = 0.001548589556477964
Validation loss = 0.001432394958101213
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00895159225910902
Validation loss = 0.002366193337365985
Validation loss = 0.0025764713063836098
Validation loss = 0.003895420115441084
Validation loss = 0.002070764545351267
Validation loss = 0.0032409997656941414
Validation loss = 0.0030387449078261852
Validation loss = 0.002985205966979265
Validation loss = 0.0020643838215619326
Validation loss = 0.0017411684384569526
Validation loss = 0.0018390963086858392
Validation loss = 0.0014780227793380618
Validation loss = 0.0018726640846580267
Validation loss = 0.0032711680978536606
Validation loss = 0.0016622468829154968
Validation loss = 0.0020865907426923513
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -115     |
| Iteration     | 21       |
| MaximumReturn | -80.8    |
| MinimumReturn | -141     |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01065877079963684
Validation loss = 0.004499702248722315
Validation loss = 0.002323110355064273
Validation loss = 0.0022089763078838587
Validation loss = 0.0016271607019007206
Validation loss = 0.001879410701803863
Validation loss = 0.001988672884181142
Validation loss = 0.0016808959189802408
Validation loss = 0.0015934855910018086
Validation loss = 0.0017376011237502098
Validation loss = 0.0021786149591207504
Validation loss = 0.0016362646128982306
Validation loss = 0.0018088547512888908
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009416857734322548
Validation loss = 0.0017933029448613524
Validation loss = 0.0015995731810107827
Validation loss = 0.0018559192540124059
Validation loss = 0.003081125905737281
Validation loss = 0.0015277450438588858
Validation loss = 0.001540084253065288
Validation loss = 0.0019181708339601755
Validation loss = 0.0018222614889964461
Validation loss = 0.0016552126035094261
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008498914539813995
Validation loss = 0.002302014036104083
Validation loss = 0.0018767487490549684
Validation loss = 0.0017440557712689042
Validation loss = 0.001975445542484522
Validation loss = 0.0020091028418391943
Validation loss = 0.0016940488712862134
Validation loss = 0.002481727860867977
Validation loss = 0.002117593539878726
Validation loss = 0.0014118613908067346
Validation loss = 0.0020400742068886757
Validation loss = 0.0017633639508858323
Validation loss = 0.0016974869649857283
Validation loss = 0.0017059004167094827
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007625369820743799
Validation loss = 0.0020397461485117674
Validation loss = 0.0017458675429224968
Validation loss = 0.0014108464820310473
Validation loss = 0.0016022906638681889
Validation loss = 0.001999732805415988
Validation loss = 0.0024867437314242125
Validation loss = 0.002018954837694764
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011815889738500118
Validation loss = 0.002325568115338683
Validation loss = 0.001856266870163381
Validation loss = 0.001835196977481246
Validation loss = 0.0015682685188949108
Validation loss = 0.001353882602415979
Validation loss = 0.002330996096134186
Validation loss = 0.0017895312048494816
Validation loss = 0.002224355936050415
Validation loss = 0.0017313065472990274
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -121     |
| Iteration     | 22       |
| MaximumReturn | -85.4    |
| MinimumReturn | -146     |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0051887123845517635
Validation loss = 0.001376661821268499
Validation loss = 0.0016350382938981056
Validation loss = 0.001761272782459855
Validation loss = 0.0014815167523920536
Validation loss = 0.0017145376186817884
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005063713528215885
Validation loss = 0.0016320854192599654
Validation loss = 0.0015280371299013495
Validation loss = 0.001371812424622476
Validation loss = 0.0012056741397827864
Validation loss = 0.001222588587552309
Validation loss = 0.0011851914459839463
Validation loss = 0.0014056253712624311
Validation loss = 0.001520175370387733
Validation loss = 0.0022772173397243023
Validation loss = 0.0011042490368708968
Validation loss = 0.0018436393002048135
Validation loss = 0.001617090543732047
Validation loss = 0.0016048053512349725
Validation loss = 0.002196693094447255
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007217990700155497
Validation loss = 0.0018650584388524294
Validation loss = 0.0018899223068729043
Validation loss = 0.0016334401443600655
Validation loss = 0.0017869581934064627
Validation loss = 0.001454147044569254
Validation loss = 0.0015720613300800323
Validation loss = 0.0013507634866982698
Validation loss = 0.0016253705834969878
Validation loss = 0.0015756828943267465
Validation loss = 0.0022407181095331907
Validation loss = 0.0016613032203167677
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008123236708343029
Validation loss = 0.0013498903717845678
Validation loss = 0.0015919884899631143
Validation loss = 0.0014232138637453318
Validation loss = 0.0012408045586198568
Validation loss = 0.001282271696254611
Validation loss = 0.001389398006722331
Validation loss = 0.0015850609634071589
Validation loss = 0.0015346105210483074
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00517176603898406
Validation loss = 0.0015016022371128201
Validation loss = 0.001478100079111755
Validation loss = 0.0011889806482940912
Validation loss = 0.0012393672950565815
Validation loss = 0.0015608260873705149
Validation loss = 0.0023221743758767843
Validation loss = 0.0013502775691449642
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -91.2    |
| Iteration     | 23       |
| MaximumReturn | -19.3    |
| MinimumReturn | -132     |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003564542857930064
Validation loss = 0.0012718387879431248
Validation loss = 0.0011239759624004364
Validation loss = 0.0009368731407448649
Validation loss = 0.0010919613996520638
Validation loss = 0.001074955565854907
Validation loss = 0.0012252649758011103
Validation loss = 0.0013840652536600828
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00614919513463974
Validation loss = 0.002604347886517644
Validation loss = 0.0014892822364345193
Validation loss = 0.0027772309258580208
Validation loss = 0.001082289731130004
Validation loss = 0.0011365902610123158
Validation loss = 0.0009235738543793559
Validation loss = 0.0008601142908446491
Validation loss = 0.0010167781729251146
Validation loss = 0.0009460304863750935
Validation loss = 0.0012126702349632978
Validation loss = 0.0009710646118037403
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005750634241849184
Validation loss = 0.0014201754238456488
Validation loss = 0.001129097305238247
Validation loss = 0.001152914366684854
Validation loss = 0.001374133164063096
Validation loss = 0.0011618293356150389
Validation loss = 0.0011007236316800117
Validation loss = 0.001319437869824469
Validation loss = 0.0014217470306903124
Validation loss = 0.0012195430463179946
Validation loss = 0.0012239499483257532
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0053193168714642525
Validation loss = 0.001983936410397291
Validation loss = 0.0010786603670567274
Validation loss = 0.0010417753364890814
Validation loss = 0.001097359461709857
Validation loss = 0.0010394644923508167
Validation loss = 0.0009722068207338452
Validation loss = 0.0011819854844361544
Validation loss = 0.0013590902090072632
Validation loss = 0.0011883294209837914
Validation loss = 0.001077048247680068
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00473096314817667
Validation loss = 0.001439743908122182
Validation loss = 0.0013139174552634358
Validation loss = 0.0011390296276658773
Validation loss = 0.0011036640498787165
Validation loss = 0.0011696936562657356
Validation loss = 0.0011812198208644986
Validation loss = 0.0010069111594930291
Validation loss = 0.0017970014596357942
Validation loss = 0.0017276915023103356
Validation loss = 0.0012229109415784478
Validation loss = 0.0011992203071713448
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -33.7    |
| Iteration     | 24       |
| MaximumReturn | -0.0571  |
| MinimumReturn | -96.9    |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021012816578149796
Validation loss = 0.0011684708297252655
Validation loss = 0.0011197206331416965
Validation loss = 0.0009641370852477849
Validation loss = 0.0011093750363215804
Validation loss = 0.0014973499346524477
Validation loss = 0.001282518613152206
Validation loss = 0.0012213909067213535
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0021165250800549984
Validation loss = 0.0011865736450999975
Validation loss = 0.0013553458265960217
Validation loss = 0.001421398133970797
Validation loss = 0.000974431459326297
Validation loss = 0.001233308226801455
Validation loss = 0.0011827072594314814
Validation loss = 0.001000579446554184
Validation loss = 0.0010192252229899168
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017118079122155905
Validation loss = 0.0010267769685015082
Validation loss = 0.0015648851403966546
Validation loss = 0.0010885541560128331
Validation loss = 0.001436432241462171
Validation loss = 0.0015221248613670468
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0024422307033091784
Validation loss = 0.0009416417451575398
Validation loss = 0.0009338388917967677
Validation loss = 0.0010661029955372214
Validation loss = 0.0014150028582662344
Validation loss = 0.0009272455936297774
Validation loss = 0.001987564144656062
Validation loss = 0.00101569015532732
Validation loss = 0.0013104421086609364
Validation loss = 0.0013402024051174521
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0025314330123364925
Validation loss = 0.0012819287367165089
Validation loss = 0.0012438419507816434
Validation loss = 0.0009032724774442613
Validation loss = 0.000929000205360353
Validation loss = 0.0025389569345861673
Validation loss = 0.001068722689524293
Validation loss = 0.000904449843801558
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -101     |
| Iteration     | 25       |
| MaximumReturn | -0.184   |
| MinimumReturn | -124     |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003031011437997222
Validation loss = 0.0010764155304059386
Validation loss = 0.001368885044939816
Validation loss = 0.001059839385561645
Validation loss = 0.000880090519785881
Validation loss = 0.0008187104249373078
Validation loss = 0.0011616807896643877
Validation loss = 0.0009696062770672143
Validation loss = 0.0010668655158951879
Validation loss = 0.0013575325720012188
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0029192492365837097
Validation loss = 0.000990969012491405
Validation loss = 0.0009058641735464334
Validation loss = 0.0011911956826224923
Validation loss = 0.0008131044451147318
Validation loss = 0.0008178852731361985
Validation loss = 0.0008852960891090333
Validation loss = 0.0009567904635332525
Validation loss = 0.0013375653652474284
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027218551840633154
Validation loss = 0.0008829522994346917
Validation loss = 0.001035410794429481
Validation loss = 0.0011425712145864964
Validation loss = 0.0009398646070621908
Validation loss = 0.0009047357016243041
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0034927153028547764
Validation loss = 0.0010764988837763667
Validation loss = 0.0009770388714969158
Validation loss = 0.0009467462659813464
Validation loss = 0.0010816720314323902
Validation loss = 0.0010950173018500209
Validation loss = 0.0011312090791761875
Validation loss = 0.0009536354336887598
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0030162848997861147
Validation loss = 0.0011632818495854735
Validation loss = 0.0011124162701889873
Validation loss = 0.0008188259089365602
Validation loss = 0.0010875125881284475
Validation loss = 0.0013187818694859743
Validation loss = 0.0009508250514045358
Validation loss = 0.0010013150749728084
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.25    |
| Iteration     | 26       |
| MaximumReturn | -0.0647  |
| MinimumReturn | -15.5    |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001340732560493052
Validation loss = 0.000911674287635833
Validation loss = 0.0008949238690547645
Validation loss = 0.0013666065642610192
Validation loss = 0.0007620268152095377
Validation loss = 0.0009248297428712249
Validation loss = 0.0013418728485703468
Validation loss = 0.000879023689776659
Validation loss = 0.000798448862042278
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001858703326433897
Validation loss = 0.0010078991763293743
Validation loss = 0.0013717443216592073
Validation loss = 0.0008323731017298996
Validation loss = 0.0010378870647400618
Validation loss = 0.0009014805546030402
Validation loss = 0.0010268859332427382
Validation loss = 0.0014772767899557948
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009397259564138949
Validation loss = 0.0008785360842011869
Validation loss = 0.0010758868884295225
Validation loss = 0.001158956321887672
Validation loss = 0.001805307692848146
Validation loss = 0.0009073143592104316
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001107043237425387
Validation loss = 0.0013302485458552837
Validation loss = 0.001046565710566938
Validation loss = 0.0010836852015927434
Validation loss = 0.0010729623027145863
Validation loss = 0.0012884103925898671
Validation loss = 0.00093183753779158
Validation loss = 0.00098221015650779
Validation loss = 0.0013925086241215467
Validation loss = 0.0010560472728684545
Validation loss = 0.001014222390949726
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013457911554723978
Validation loss = 0.001076592248864472
Validation loss = 0.0012128319358453155
Validation loss = 0.0019940997008234262
Validation loss = 0.0016367660136893392
Validation loss = 0.001212126575410366
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.614   |
| Iteration     | 27       |
| MaximumReturn | -0.0826  |
| MinimumReturn | -8.96    |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012762158876284957
Validation loss = 0.001224448555149138
Validation loss = 0.001534326933324337
Validation loss = 0.0013262730790302157
Validation loss = 0.001018534880131483
Validation loss = 0.0009622186771593988
Validation loss = 0.0012024935567751527
Validation loss = 0.0010251774219796062
Validation loss = 0.0014708141097798944
Validation loss = 0.001124878297559917
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009404763695783913
Validation loss = 0.0011695068096742034
Validation loss = 0.0008623104658909142
Validation loss = 0.0010017360327765346
Validation loss = 0.0012155957520008087
Validation loss = 0.0008185320184566081
Validation loss = 0.0010763845639303327
Validation loss = 0.0009432088700123131
Validation loss = 0.0009787424933165312
Validation loss = 0.0012927624629810452
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009322890546172857
Validation loss = 0.0009226558613590896
Validation loss = 0.0009533860720694065
Validation loss = 0.0013061544159427285
Validation loss = 0.0009475749102421105
Validation loss = 0.0008284726063720882
Validation loss = 0.0009488228824920952
Validation loss = 0.0014513015048578382
Validation loss = 0.0010490384884178638
Validation loss = 0.0007462226785719395
Validation loss = 0.0009368888568133116
Validation loss = 0.0009676148183643818
Validation loss = 0.0008594202226959169
Validation loss = 0.001065378193743527
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008384468383155763
Validation loss = 0.001179422833956778
Validation loss = 0.0010576938511803746
Validation loss = 0.000949016772210598
Validation loss = 0.0008809374994598329
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010829345555976033
Validation loss = 0.0011075888760387897
Validation loss = 0.0008930267649702728
Validation loss = 0.0011028032749891281
Validation loss = 0.0012625987874343991
Validation loss = 0.0009325835853815079
Validation loss = 0.0012137583689764142
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00241 |
| Iteration     | 28       |
| MaximumReturn | -0.00197 |
| MinimumReturn | -0.00328 |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002127985469996929
Validation loss = 0.0008566302130930126
Validation loss = 0.0026325026992708445
Validation loss = 0.0009909626096487045
Validation loss = 0.0008398271165788174
Validation loss = 0.000874904973898083
Validation loss = 0.0009146036463789642
Validation loss = 0.0009576476295478642
Validation loss = 0.0008056143415160477
Validation loss = 0.001157010323368013
Validation loss = 0.0009664659737609327
Validation loss = 0.0013393651461228728
Validation loss = 0.0010581003734841943
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001121544511988759
Validation loss = 0.000858134008012712
Validation loss = 0.0010274398373439908
Validation loss = 0.0010006662923842669
Validation loss = 0.0010797632858157158
Validation loss = 0.0010333318496122956
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012626249808818102
Validation loss = 0.0010257703252136707
Validation loss = 0.0012209581909701228
Validation loss = 0.0013608630979433656
Validation loss = 0.0015483186580240726
Validation loss = 0.0013481222558766603
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000956143019720912
Validation loss = 0.0009625901002436876
Validation loss = 0.0010825837962329388
Validation loss = 0.0009588995599187911
Validation loss = 0.001781894126906991
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017536255763843656
Validation loss = 0.0007510767900384963
Validation loss = 0.000923909479752183
Validation loss = 0.0014824495883658528
Validation loss = 0.0016612533945590258
Validation loss = 0.0010721430880948901
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0576  |
| Iteration     | 29       |
| MaximumReturn | -0.0298  |
| MinimumReturn | -0.0932  |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009034872055053711
Validation loss = 0.0007889472180977464
Validation loss = 0.0008467801380902529
Validation loss = 0.0007921245414763689
Validation loss = 0.0013217026134952903
Validation loss = 0.0009078886359930038
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000941763399168849
Validation loss = 0.0012226070975884795
Validation loss = 0.0008722774218767881
Validation loss = 0.0009037964046001434
Validation loss = 0.001194191980175674
Validation loss = 0.001221438404172659
Validation loss = 0.0012898643035441637
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009379805414937437
Validation loss = 0.0012607771204784513
Validation loss = 0.0011152364313602448
Validation loss = 0.0009786770679056644
Validation loss = 0.001597624272108078
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009996319422498345
Validation loss = 0.0009549582609906793
Validation loss = 0.0008515745867043734
Validation loss = 0.0009662711527198553
Validation loss = 0.0009338606614619493
Validation loss = 0.0014279038878157735
Validation loss = 0.001145156566053629
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013796576531603932
Validation loss = 0.0010308483615517616
Validation loss = 0.0010859944159165025
Validation loss = 0.0009653338929638267
Validation loss = 0.000881964573636651
Validation loss = 0.00074586714617908
Validation loss = 0.0009662896627560258
Validation loss = 0.0015227175317704678
Validation loss = 0.0009170012199319899
Validation loss = 0.001771484618075192
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.769   |
| Iteration     | 30       |
| MaximumReturn | -0.0253  |
| MinimumReturn | -18.3    |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009868631605058908
Validation loss = 0.0010733348317444324
Validation loss = 0.0008817545021884143
Validation loss = 0.0008551766513846815
Validation loss = 0.0010265508899465203
Validation loss = 0.0014314616564661264
Validation loss = 0.0011624739272519946
Validation loss = 0.0008512872154824436
Validation loss = 0.0008674742421135306
Validation loss = 0.0007325342739932239
Validation loss = 0.0009535358985885978
Validation loss = 0.0009479985455982387
Validation loss = 0.0012047553900629282
Validation loss = 0.0010108430869877338
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000924841093365103
Validation loss = 0.0009205023525282741
Validation loss = 0.0007310793153010309
Validation loss = 0.0011123965959995985
Validation loss = 0.0011055157519876957
Validation loss = 0.000884706387296319
Validation loss = 0.0009409353951923549
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001116449129767716
Validation loss = 0.0010498937917873263
Validation loss = 0.0011437605135142803
Validation loss = 0.0010773858521133661
Validation loss = 0.0008754926966503263
Validation loss = 0.0007501837098971009
Validation loss = 0.0011336914030835032
Validation loss = 0.001041558338329196
Validation loss = 0.0008117006509564817
Validation loss = 0.0009084854973480105
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007748774369247258
Validation loss = 0.0012917809654027224
Validation loss = 0.0008935710648074746
Validation loss = 0.001421121647581458
Validation loss = 0.0008046595612540841
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012506464263424277
Validation loss = 0.0009091148967854679
Validation loss = 0.0016377379652112722
Validation loss = 0.0010193390771746635
Validation loss = 0.0012982003390789032
Validation loss = 0.0007745818002149463
Validation loss = 0.0008191863307729363
Validation loss = 0.0008026904542930424
Validation loss = 0.0009934405097737908
Validation loss = 0.0009438700508326292
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00184 |
| Iteration     | 31       |
| MaximumReturn | -0.00154 |
| MinimumReturn | -0.00223 |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001094560488127172
Validation loss = 0.0009785703150555491
Validation loss = 0.0011659144656732678
Validation loss = 0.0010820927564054728
Validation loss = 0.0009196560131385922
Validation loss = 0.0011483111884444952
Validation loss = 0.0010133037576451898
Validation loss = 0.0013554415199905634
Validation loss = 0.0007421188056468964
Validation loss = 0.0010155125055462122
Validation loss = 0.001208850764669478
Validation loss = 0.0008611399098299444
Validation loss = 0.0008798643248155713
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009987112134695053
Validation loss = 0.000719442090485245
Validation loss = 0.0008513506618328393
Validation loss = 0.0012063368922099471
Validation loss = 0.0010110135190188885
Validation loss = 0.0008472168119624257
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013430728577077389
Validation loss = 0.0014300746843218803
Validation loss = 0.0012955112615600228
Validation loss = 0.0012246804544702172
Validation loss = 0.0013162349350750446
Validation loss = 0.0012700333027169108
Validation loss = 0.0014155699172988534
Validation loss = 0.0010299922432750463
Validation loss = 0.0008971164934337139
Validation loss = 0.0007679414702579379
Validation loss = 0.0007202283595688641
Validation loss = 0.0010687607573345304
Validation loss = 0.0009170800331048667
Validation loss = 0.0009298233781009912
Validation loss = 0.0008305436349473894
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013290847418829799
Validation loss = 0.0009049889049492776
Validation loss = 0.0009004310704767704
Validation loss = 0.0014357910258695483
Validation loss = 0.0009568810346536338
Validation loss = 0.000918649835512042
Validation loss = 0.0010024631628766656
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010361270979046822
Validation loss = 0.001326798927038908
Validation loss = 0.0008257823646999896
Validation loss = 0.0009441505535505712
Validation loss = 0.001069568214006722
Validation loss = 0.0009227306582033634
Validation loss = 0.0014656655257567763
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.111   |
| Iteration     | 32       |
| MaximumReturn | -0.0636  |
| MinimumReturn | -0.172   |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001312777167186141
Validation loss = 0.0009601108613424003
Validation loss = 0.0008623817120678723
Validation loss = 0.0009041536832228303
Validation loss = 0.0008411174640059471
Validation loss = 0.0012877847766503692
Validation loss = 0.0010840198956429958
Validation loss = 0.0007597822113893926
Validation loss = 0.0008993289084173739
Validation loss = 0.0011694958666339517
Validation loss = 0.0006732814363203943
Validation loss = 0.0009795462246984243
Validation loss = 0.0008097091340459883
Validation loss = 0.0009458748390898108
Validation loss = 0.0010440018959343433
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000994154834188521
Validation loss = 0.0010114602046087384
Validation loss = 0.0008104024454951286
Validation loss = 0.001066711382009089
Validation loss = 0.0009449455537833273
Validation loss = 0.0007665148004889488
Validation loss = 0.000927596352994442
Validation loss = 0.0012048037024214864
Validation loss = 0.0007614208152517676
Validation loss = 0.0007789504015818238
Validation loss = 0.00115960871335119
Validation loss = 0.0008845930569805205
Validation loss = 0.0008928844472393394
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016163941472768784
Validation loss = 0.0009056926355697215
Validation loss = 0.0009420440183021128
Validation loss = 0.0008594092214480042
Validation loss = 0.0009533488773740828
Validation loss = 0.0010858176974579692
Validation loss = 0.0008039035019464791
Validation loss = 0.0007913427543826401
Validation loss = 0.0008190753287635744
Validation loss = 0.00148177077062428
Validation loss = 0.0008195913396775723
Validation loss = 0.0009568083914928138
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012863725423812866
Validation loss = 0.0012649750569835305
Validation loss = 0.0008918652310967445
Validation loss = 0.0008847398567013443
Validation loss = 0.0008283047936856747
Validation loss = 0.0007768290815874934
Validation loss = 0.0011855734046548605
Validation loss = 0.0009127567755058408
Validation loss = 0.0009024922619573772
Validation loss = 0.0009856960969045758
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012775175273418427
Validation loss = 0.0008406959823332727
Validation loss = 0.001213366398587823
Validation loss = 0.0008190185762941837
Validation loss = 0.0010211959015578032
Validation loss = 0.001694891252554953
Validation loss = 0.0006597194005735219
Validation loss = 0.001295588561333716
Validation loss = 0.0006623455556109548
Validation loss = 0.0016535100294277072
Validation loss = 0.0014201307203620672
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.105   |
| Iteration     | 33       |
| MaximumReturn | -0.0228  |
| MinimumReturn | -0.19    |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007927828119136393
Validation loss = 0.0011861501261591911
Validation loss = 0.0006412200746126473
Validation loss = 0.0011849146103486419
Validation loss = 0.0006733069312758744
Validation loss = 0.0008086783927865326
Validation loss = 0.0011048357700929046
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008420701487921178
Validation loss = 0.0007876117015257478
Validation loss = 0.0007389163947664201
Validation loss = 0.0008125003660097718
Validation loss = 0.0009045285987667739
Validation loss = 0.0008655748679302633
Validation loss = 0.0010572720784693956
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007737911655567586
Validation loss = 0.0007402261253446341
Validation loss = 0.0009501267923042178
Validation loss = 0.0008822799427434802
Validation loss = 0.0012617508182302117
Validation loss = 0.0008400630904361606
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007631970220245421
Validation loss = 0.0009240570943802595
Validation loss = 0.001410158583894372
Validation loss = 0.0007351288804784417
Validation loss = 0.0009887920459732413
Validation loss = 0.0013989333529025316
Validation loss = 0.0007114471518434584
Validation loss = 0.0010213055647909641
Validation loss = 0.0012009041383862495
Validation loss = 0.0014279840979725122
Validation loss = 0.0008919084211811423
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008133453666232526
Validation loss = 0.0007317952113226056
Validation loss = 0.0012752509210258722
Validation loss = 0.0009916892740875483
Validation loss = 0.0011152795050293207
Validation loss = 0.0009196293540298939
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.141   |
| Iteration     | 34       |
| MaximumReturn | -0.0564  |
| MinimumReturn | -0.279   |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007412715931423008
Validation loss = 0.0007069579442031682
Validation loss = 0.0010155094787478447
Validation loss = 0.0006673551397398114
Validation loss = 0.0011390909785404801
Validation loss = 0.0007719116401858628
Validation loss = 0.00109454698394984
Validation loss = 0.0007618073141202331
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010881844209507108
Validation loss = 0.0008385051041841507
Validation loss = 0.0007083506789058447
Validation loss = 0.001042878138832748
Validation loss = 0.0007600618992000818
Validation loss = 0.0007609102758578956
Validation loss = 0.0006590358098037541
Validation loss = 0.000942936516366899
Validation loss = 0.0012701391242444515
Validation loss = 0.0009707954013720155
Validation loss = 0.0007199005922302604
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009642009972594678
Validation loss = 0.0007559963269159198
Validation loss = 0.0014698297018185258
Validation loss = 0.0008679121965542436
Validation loss = 0.0007633219938725233
Validation loss = 0.0011924182763323188
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008827798301354051
Validation loss = 0.0009043018217198551
Validation loss = 0.0009914147667586803
Validation loss = 0.0009270819136872888
Validation loss = 0.001508944435045123
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008108742185868323
Validation loss = 0.000961106619797647
Validation loss = 0.0016046445816755295
Validation loss = 0.0007034286973066628
Validation loss = 0.0010930460412055254
Validation loss = 0.0008270328980870545
Validation loss = 0.0008636544225737453
Validation loss = 0.0013886996312066913
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0278  |
| Iteration     | 35       |
| MaximumReturn | -0.0152  |
| MinimumReturn | -0.0464  |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010533753084018826
Validation loss = 0.0008569679921492934
Validation loss = 0.00120536086615175
Validation loss = 0.0013268603943288326
Validation loss = 0.0008810032741166651
Validation loss = 0.0014002310344949365
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009513879776932299
Validation loss = 0.0009316452196799219
Validation loss = 0.001120110391639173
Validation loss = 0.0014130625640973449
Validation loss = 0.0007373027037829161
Validation loss = 0.001079852576367557
Validation loss = 0.000819504028186202
Validation loss = 0.0008329136180691421
Validation loss = 0.0007440446061082184
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008328022086061537
Validation loss = 0.0008896862855181098
Validation loss = 0.0011099108960479498
Validation loss = 0.0010713002411648631
Validation loss = 0.001592401647940278
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011608804343268275
Validation loss = 0.0009176832973025739
Validation loss = 0.0010069315321743488
Validation loss = 0.0010172327747568488
Validation loss = 0.0008430338930338621
Validation loss = 0.0009244133252650499
Validation loss = 0.0009833297226577997
Validation loss = 0.0010884571820497513
Validation loss = 0.0007727716001681983
Validation loss = 0.000851438962854445
Validation loss = 0.0015501051675528288
Validation loss = 0.0013965819962322712
Validation loss = 0.0007697733817622066
Validation loss = 0.0008539289119653404
Validation loss = 0.0011123783187940717
Validation loss = 0.0010327171767130494
Validation loss = 0.0012940753949806094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008874164195731282
Validation loss = 0.0012243534438312054
Validation loss = 0.0008280111942440271
Validation loss = 0.0012119035236537457
Validation loss = 0.001160870655439794
Validation loss = 0.0008756695897318423
Validation loss = 0.0007687248871661723
Validation loss = 0.001089319703169167
Validation loss = 0.0007454928709194064
Validation loss = 0.0009344382560811937
Validation loss = 0.0007192424964159727
Validation loss = 0.0007461647619493306
Validation loss = 0.0009374720393680036
Validation loss = 0.00087927927961573
Validation loss = 0.0010789403459057212
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0714  |
| Iteration     | 36       |
| MaximumReturn | -0.0344  |
| MinimumReturn | -0.123   |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009086454520002007
Validation loss = 0.0006708072614856064
Validation loss = 0.0007148811127990484
Validation loss = 0.0006900342996232212
Validation loss = 0.0019087372347712517
Validation loss = 0.0009221680229529738
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007021838100627065
Validation loss = 0.0007692728540860116
Validation loss = 0.001032169908285141
Validation loss = 0.0010823068441823125
Validation loss = 0.0009544323547743261
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007753911195322871
Validation loss = 0.0009476645500399172
Validation loss = 0.0009270751615986228
Validation loss = 0.0010081432992592454
Validation loss = 0.0012265550903975964
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00078681850573048
Validation loss = 0.0010438469471409917
Validation loss = 0.0014203705359250307
Validation loss = 0.0009329680469818413
Validation loss = 0.0010394809069111943
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008586006588302553
Validation loss = 0.0009105518693104386
Validation loss = 0.0010614232160151005
Validation loss = 0.0014934897189959884
Validation loss = 0.0008718845201656222
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00846 |
| Iteration     | 37       |
| MaximumReturn | -0.00602 |
| MinimumReturn | -0.0119  |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006577565800398588
Validation loss = 0.0011853557080030441
Validation loss = 0.0006675634067505598
Validation loss = 0.0007744425674900413
Validation loss = 0.0009705315460450947
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007465684902854264
Validation loss = 0.0009753102203831077
Validation loss = 0.0006367826135829091
Validation loss = 0.0006498257862403989
Validation loss = 0.0007138225482776761
Validation loss = 0.0011194455437362194
Validation loss = 0.0007928828708827496
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007814700365997851
Validation loss = 0.0008185181068256497
Validation loss = 0.0008439992088824511
Validation loss = 0.0011079030809924006
Validation loss = 0.000838780659250915
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009761004475876689
Validation loss = 0.0011236491845920682
Validation loss = 0.000748427293729037
Validation loss = 0.0008889437303878367
Validation loss = 0.0009499923326075077
Validation loss = 0.0006466795457527041
Validation loss = 0.0012502785539254546
Validation loss = 0.00109345861710608
Validation loss = 0.0008180129807442427
Validation loss = 0.001040273578837514
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008092959178611636
Validation loss = 0.000984136015176773
Validation loss = 0.0005460429238155484
Validation loss = 0.0009194705635309219
Validation loss = 0.001013374188914895
Validation loss = 0.0009901025332510471
Validation loss = 0.0011831209994852543
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.103   |
| Iteration     | 38       |
| MaximumReturn | -0.00161 |
| MinimumReturn | -2.5     |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009492985554970801
Validation loss = 0.0013233163626864552
Validation loss = 0.0007380570750683546
Validation loss = 0.0009774623904377222
Validation loss = 0.0010201599216088653
Validation loss = 0.0011334401788190007
Validation loss = 0.0014447676949203014
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014404807006940246
Validation loss = 0.0008568304474465549
Validation loss = 0.0016346421325579286
Validation loss = 0.0011180529836565256
Validation loss = 0.0006464823964051902
Validation loss = 0.0008820946095511317
Validation loss = 0.0006513865082524717
Validation loss = 0.0008878911612555385
Validation loss = 0.0009648050181567669
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007428800454363227
Validation loss = 0.0012365165166556835
Validation loss = 0.0007879553595557809
Validation loss = 0.001002716482616961
Validation loss = 0.0008073167409747839
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009104626369662583
Validation loss = 0.0008781798533163965
Validation loss = 0.0010456934105604887
Validation loss = 0.0007646854501217604
Validation loss = 0.0006794020882807672
Validation loss = 0.0012754438212141395
Validation loss = 0.00082026282325387
Validation loss = 0.0011194577673450112
Validation loss = 0.000777742185164243
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009120141039602458
Validation loss = 0.0007062142249196768
Validation loss = 0.0007989816949702799
Validation loss = 0.0006295712664723396
Validation loss = 0.000792466220445931
Validation loss = 0.0013525413814932108
Validation loss = 0.0009147058008238673
Validation loss = 0.0007314492831937969
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.114   |
| Iteration     | 39       |
| MaximumReturn | -0.0381  |
| MinimumReturn | -0.593   |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011900273384526372
Validation loss = 0.0007165818242356181
Validation loss = 0.0008883445407263935
Validation loss = 0.0006114273564890027
Validation loss = 0.000871609547175467
Validation loss = 0.0005729648401029408
Validation loss = 0.0008122840081341565
Validation loss = 0.0006878876592963934
Validation loss = 0.0008719671750441194
Validation loss = 0.0006685006665065885
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008689056267030537
Validation loss = 0.0007338071591220796
Validation loss = 0.0009944471530616283
Validation loss = 0.0006202849326655269
Validation loss = 0.0007177949883043766
Validation loss = 0.0009554917342029512
Validation loss = 0.0005697383894585073
Validation loss = 0.0006034754333086312
Validation loss = 0.0008305629598908126
Validation loss = 0.0006873527308925986
Validation loss = 0.0009883851744234562
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014561298303306103
Validation loss = 0.0007201411062851548
Validation loss = 0.0011458052322268486
Validation loss = 0.0010632656048983335
Validation loss = 0.000748719903640449
Validation loss = 0.0009788821917027235
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008302651694975793
Validation loss = 0.0007934434106573462
Validation loss = 0.0007999678491614759
Validation loss = 0.001416853629052639
Validation loss = 0.0007849828689359128
Validation loss = 0.0009215017198584974
Validation loss = 0.0006179437623359263
Validation loss = 0.0012443369487300515
Validation loss = 0.0008095458033494651
Validation loss = 0.0007061272626742721
Validation loss = 0.0009135672589763999
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009519719751551747
Validation loss = 0.0006522500771097839
Validation loss = 0.0007682867581024766
Validation loss = 0.0007395907305181026
Validation loss = 0.0007394860149361193
Validation loss = 0.0011889779707416892
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.151   |
| Iteration     | 40       |
| MaximumReturn | -0.0266  |
| MinimumReturn | -1.01    |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000630799971986562
Validation loss = 0.0007456765160895884
Validation loss = 0.0006912631797604263
Validation loss = 0.0007461935165338218
Validation loss = 0.0005422772374004126
Validation loss = 0.0016506058163940907
Validation loss = 0.0006935993442311883
Validation loss = 0.0006624669767916203
Validation loss = 0.0006519180606119335
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009555634460411966
Validation loss = 0.0008440304663963616
Validation loss = 0.0005856758216395974
Validation loss = 0.0010172053007408977
Validation loss = 0.0008282295311801136
Validation loss = 0.0008038054220378399
Validation loss = 0.000765743141528219
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010883479844778776
Validation loss = 0.0006608080002479255
Validation loss = 0.0007676801760680974
Validation loss = 0.0010345722548663616
Validation loss = 0.000639131641946733
Validation loss = 0.0009639973286539316
Validation loss = 0.0006993333227001131
Validation loss = 0.0013784182956442237
Validation loss = 0.0008641229942440987
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010347134666517377
Validation loss = 0.0006933095282875001
Validation loss = 0.0009518745937384665
Validation loss = 0.0012128158705309033
Validation loss = 0.0007929674466140568
Validation loss = 0.0013718358241021633
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016753164818510413
Validation loss = 0.0013358042342588305
Validation loss = 0.0011023797560483217
Validation loss = 0.0007635137881152332
Validation loss = 0.0007701742579229176
Validation loss = 0.0010061160428449512
Validation loss = 0.0007586689898744226
Validation loss = 0.001087214332073927
Validation loss = 0.0009630928980186582
Validation loss = 0.000941903330385685
Validation loss = 0.0007495760801248252
Validation loss = 0.000932524271775037
Validation loss = 0.0005867385189048946
Validation loss = 0.0006097142468206584
Validation loss = 0.0008687920635566115
Validation loss = 0.00088232517009601
Validation loss = 0.0006310673779807985
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -14.3    |
| Iteration     | 41       |
| MaximumReturn | -0.0677  |
| MinimumReturn | -49.5    |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009052587556652725
Validation loss = 0.0007492606528103352
Validation loss = 0.0007452098652720451
Validation loss = 0.0010740334400907159
Validation loss = 0.0006620006170123816
Validation loss = 0.0010713705560192466
Validation loss = 0.0007744788308627903
Validation loss = 0.0008019900415092707
Validation loss = 0.0007599659729748964
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001036500558257103
Validation loss = 0.0010820090537890792
Validation loss = 0.0007095731562003493
Validation loss = 0.0008716071024537086
Validation loss = 0.001052333042025566
Validation loss = 0.0008385805413126945
Validation loss = 0.0009552697883918881
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008029086166061461
Validation loss = 0.0007867570966482162
Validation loss = 0.0011510258773341775
Validation loss = 0.0009698987123556435
Validation loss = 0.0008230884559452534
Validation loss = 0.0007233878714032471
Validation loss = 0.0011388750281184912
Validation loss = 0.0009575283038429916
Validation loss = 0.0010329545475542545
Validation loss = 0.0007485045352950692
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013660485856235027
Validation loss = 0.0010523597011342645
Validation loss = 0.0008742604986764491
Validation loss = 0.0006462873425334692
Validation loss = 0.0007318336283788085
Validation loss = 0.0007470264099538326
Validation loss = 0.0008203907054848969
Validation loss = 0.0009779657702893019
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007370274979621172
Validation loss = 0.0006746286526322365
Validation loss = 0.0006662814994342625
Validation loss = 0.0009005140163935721
Validation loss = 0.0007536691264249384
Validation loss = 0.0010624037822708488
Validation loss = 0.0008470648899674416
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.247   |
| Iteration     | 42       |
| MaximumReturn | -0.0674  |
| MinimumReturn | -3.76    |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008099540718831122
Validation loss = 0.0006414390518330038
Validation loss = 0.0006726877763867378
Validation loss = 0.0007776189595460892
Validation loss = 0.0007267837645485997
Validation loss = 0.0006402595899999142
Validation loss = 0.0007547856657765806
Validation loss = 0.0007444497314281762
Validation loss = 0.0014570022467523813
Validation loss = 0.001492625568062067
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010984427062794566
Validation loss = 0.0013247476890683174
Validation loss = 0.0011463393457233906
Validation loss = 0.0007135474588721991
Validation loss = 0.0014220626326277852
Validation loss = 0.0008141110884025693
Validation loss = 0.0005913726636208594
Validation loss = 0.0010745130712166429
Validation loss = 0.0007097000489011407
Validation loss = 0.0008083884022198617
Validation loss = 0.0014042234979569912
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009161685011349618
Validation loss = 0.0013750301441177726
Validation loss = 0.0010007829405367374
Validation loss = 0.0008828341960906982
Validation loss = 0.0007589720189571381
Validation loss = 0.001113008358515799
Validation loss = 0.0006131531554274261
Validation loss = 0.0007895884336903691
Validation loss = 0.0006965696811676025
Validation loss = 0.0007830409449525177
Validation loss = 0.0007336128037422895
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008567770128138363
Validation loss = 0.0013102266239002347
Validation loss = 0.0008384858956560493
Validation loss = 0.0008381240186281502
Validation loss = 0.0007269266061484814
Validation loss = 0.0006458583520725369
Validation loss = 0.0008275185246020555
Validation loss = 0.0009467295603826642
Validation loss = 0.0010612531332299113
Validation loss = 0.0008099095430225134
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008206392521969974
Validation loss = 0.0005866069113835692
Validation loss = 0.0009524073102511466
Validation loss = 0.0008428344735875726
Validation loss = 0.0006698123761452734
Validation loss = 0.0008704056963324547
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.7    |
| Iteration     | 43       |
| MaximumReturn | -0.256   |
| MinimumReturn | -49.9    |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000768739206250757
Validation loss = 0.000983507721684873
Validation loss = 0.0007986927521415055
Validation loss = 0.0004970621666871011
Validation loss = 0.0008541443385183811
Validation loss = 0.0007604087004438043
Validation loss = 0.0009039161377586424
Validation loss = 0.0008242340991273522
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009233344462700188
Validation loss = 0.0009377669193781912
Validation loss = 0.0007695275126025081
Validation loss = 0.0010611864272505045
Validation loss = 0.0006908618379384279
Validation loss = 0.0007429966353811324
Validation loss = 0.0008043309790082276
Validation loss = 0.0008626809576526284
Validation loss = 0.0006964270141907036
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014155593235045671
Validation loss = 0.0009409555932506919
Validation loss = 0.0005504174041561782
Validation loss = 0.0012571628903970122
Validation loss = 0.0008060052059590816
Validation loss = 0.0008779651252552867
Validation loss = 0.000830625300295651
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008598873973824084
Validation loss = 0.0006983066559769213
Validation loss = 0.0008712321287021041
Validation loss = 0.0013451307313516736
Validation loss = 0.0006365540320985019
Validation loss = 0.0008426398853771389
Validation loss = 0.0012684923131018877
Validation loss = 0.0006657660705968738
Validation loss = 0.0008087351452559233
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014828217681497335
Validation loss = 0.0007337820716202259
Validation loss = 0.0008245271164923906
Validation loss = 0.0006913601537235081
Validation loss = 0.0007654727669432759
Validation loss = 0.0006350322510115802
Validation loss = 0.0005433135665953159
Validation loss = 0.0007141149835661054
Validation loss = 0.0008404957479797304
Validation loss = 0.0006422558799386024
Validation loss = 0.0006544616771861911
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.022   |
| Iteration     | 44       |
| MaximumReturn | -0.0138  |
| MinimumReturn | -0.0328  |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008185902843251824
Validation loss = 0.0018144524656236172
Validation loss = 0.0007361301686614752
Validation loss = 0.000642340921331197
Validation loss = 0.0005676859291270375
Validation loss = 0.0006571918493136764
Validation loss = 0.0005737301544286311
Validation loss = 0.0007855634321458638
Validation loss = 0.0006970971589908004
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000712249951902777
Validation loss = 0.0007058430928736925
Validation loss = 0.0006920082960277796
Validation loss = 0.0008354648016393185
Validation loss = 0.0006967311492189765
Validation loss = 0.0011295957956463099
Validation loss = 0.0011470741592347622
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007868784014135599
Validation loss = 0.0008469182648696005
Validation loss = 0.0008608576026745141
Validation loss = 0.0008202825556509197
Validation loss = 0.001017842092551291
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008229802479036152
Validation loss = 0.0007455957820639014
Validation loss = 0.0007132138125598431
Validation loss = 0.0009625568054616451
Validation loss = 0.0008250575629062951
Validation loss = 0.0008425157866440713
Validation loss = 0.0008946346351876855
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007062575314193964
Validation loss = 0.0010660869302228093
Validation loss = 0.0011082880664616823
Validation loss = 0.0011938173556700349
Validation loss = 0.0006256878841668367
Validation loss = 0.0008423252729699016
Validation loss = 0.0007205579313449562
Validation loss = 0.0009356372174806893
Validation loss = 0.000725036661606282
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -15.5    |
| Iteration     | 45       |
| MaximumReturn | -0.104   |
| MinimumReturn | -51.4    |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017109500477090478
Validation loss = 0.0008048252784647048
Validation loss = 0.0007858331664465368
Validation loss = 0.0006068499642424285
Validation loss = 0.0006255757762119174
Validation loss = 0.0007429512334056199
Validation loss = 0.0007018482428975403
Validation loss = 0.0008337709004990757
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0028133252635598183
Validation loss = 0.0014401564840227365
Validation loss = 0.0010346753988415003
Validation loss = 0.0018821526318788528
Validation loss = 0.0008594098617322743
Validation loss = 0.0009129103855229914
Validation loss = 0.0006458015413954854
Validation loss = 0.0006741938414052129
Validation loss = 0.0006052232929505408
Validation loss = 0.0006353975040838122
Validation loss = 0.0005113957449793816
Validation loss = 0.0005617751157842577
Validation loss = 0.0006825799937359989
Validation loss = 0.0009067188366316259
Validation loss = 0.0006628162227571011
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009225027170032263
Validation loss = 0.0011632716050371528
Validation loss = 0.0005880825920030475
Validation loss = 0.0007066495600156486
Validation loss = 0.0010513549204915762
Validation loss = 0.0008552634390071034
Validation loss = 0.0007662479183636606
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007347998907789588
Validation loss = 0.0007320099393837154
Validation loss = 0.0012202535290271044
Validation loss = 0.0011495421640574932
Validation loss = 0.0010096508776769042
Validation loss = 0.0006969513488002121
Validation loss = 0.0007745579932816327
Validation loss = 0.0007483990048058331
Validation loss = 0.0006180257769301534
Validation loss = 0.000605077832005918
Validation loss = 0.0006596929742954671
Validation loss = 0.001204978791065514
Validation loss = 0.000646927859634161
Validation loss = 0.0011599495774134994
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0021605417132377625
Validation loss = 0.0006228729034774005
Validation loss = 0.0006874487153254449
Validation loss = 0.0006159886834211648
Validation loss = 0.0006401806022040546
Validation loss = 0.0006057628197595477
Validation loss = 0.0011047052685171366
Validation loss = 0.0005834056646563113
Validation loss = 0.0006449976935982704
Validation loss = 0.0011803912930190563
Validation loss = 0.0011740090558305383
Validation loss = 0.001206624903716147
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00305 |
| Iteration     | 46       |
| MaximumReturn | -0.0023  |
| MinimumReturn | -0.00523 |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006190005224198103
Validation loss = 0.0006912774406373501
Validation loss = 0.000732177053578198
Validation loss = 0.0011287146480754018
Validation loss = 0.00075292814290151
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007567103602923453
Validation loss = 0.0009499994339421391
Validation loss = 0.0006683334358967841
Validation loss = 0.0006180735654197633
Validation loss = 0.0008295036968775094
Validation loss = 0.00067806028528139
Validation loss = 0.0007185019785538316
Validation loss = 0.0007492763106711209
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011474541388452053
Validation loss = 0.0008660134044475853
Validation loss = 0.0006444812170229852
Validation loss = 0.0005898859235458076
Validation loss = 0.0007435622392222285
Validation loss = 0.0009251683950424194
Validation loss = 0.0008136307587847114
Validation loss = 0.0007834376883693039
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011044767452403903
Validation loss = 0.0008115613018162549
Validation loss = 0.0006402203580364585
Validation loss = 0.000826706294901669
Validation loss = 0.0006375617813318968
Validation loss = 0.0006148817483335733
Validation loss = 0.0009934331756085157
Validation loss = 0.0005581541336141527
Validation loss = 0.0006438996642827988
Validation loss = 0.0006013187812641263
Validation loss = 0.0007846395601518452
Validation loss = 0.0007248463807627559
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007264507585205138
Validation loss = 0.0007406724616885185
Validation loss = 0.0007778214057907462
Validation loss = 0.0005637207068502903
Validation loss = 0.0007054390152916312
Validation loss = 0.0005981185822747648
Validation loss = 0.0006696846103295684
Validation loss = 0.0007858606986701488
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00702 |
| Iteration     | 47       |
| MaximumReturn | -0.00414 |
| MinimumReturn | -0.0252  |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006571585545316339
Validation loss = 0.0006386028835549951
Validation loss = 0.0013114085886627436
Validation loss = 0.0005963135627098382
Validation loss = 0.000648611516226083
Validation loss = 0.0008622591267339885
Validation loss = 0.0010280375136062503
Validation loss = 0.0007511869189329445
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001212260453030467
Validation loss = 0.0007809175294823945
Validation loss = 0.0006110540125519037
Validation loss = 0.0006693367613479495
Validation loss = 0.0006036884151399136
Validation loss = 0.0008306173840537667
Validation loss = 0.0005806133849546313
Validation loss = 0.0007193412748165429
Validation loss = 0.00161807204131037
Validation loss = 0.0007157560903578997
Validation loss = 0.0008427466382272542
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008805440738797188
Validation loss = 0.001140624051913619
Validation loss = 0.0007050606654956937
Validation loss = 0.001395324943587184
Validation loss = 0.0009579980978742242
Validation loss = 0.0007120562950149179
Validation loss = 0.0008474398637190461
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010421746410429478
Validation loss = 0.0008024036651477218
Validation loss = 0.0005980280111543834
Validation loss = 0.000784075993578881
Validation loss = 0.0007429958786815405
Validation loss = 0.0007503905799239874
Validation loss = 0.000913587398827076
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000818865664768964
Validation loss = 0.000855826772749424
Validation loss = 0.0008293039863929152
Validation loss = 0.0006727335276082158
Validation loss = 0.0008004948613233864
Validation loss = 0.0007719233981333673
Validation loss = 0.0008302713977172971
Validation loss = 0.0006398041732609272
Validation loss = 0.0013383605983108282
Validation loss = 0.0007162984111346304
Validation loss = 0.0005436635692603886
Validation loss = 0.0005714439903385937
Validation loss = 0.0007532705785706639
Validation loss = 0.0011202640598639846
Validation loss = 0.0007403733325190842
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0594  |
| Iteration     | 48       |
| MaximumReturn | -0.0206  |
| MinimumReturn | -0.398   |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007829774403944612
Validation loss = 0.0006788547616451979
Validation loss = 0.0013642363483086228
Validation loss = 0.0007309296051971614
Validation loss = 0.0007270544301718473
Validation loss = 0.0008346355753019452
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007880724151618779
Validation loss = 0.0005846022395417094
Validation loss = 0.0005831036833114922
Validation loss = 0.0009848923655226827
Validation loss = 0.0005982308066450059
Validation loss = 0.0008466157596558332
Validation loss = 0.0007508496637456119
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006226421101018786
Validation loss = 0.0009842677973210812
Validation loss = 0.00139066600240767
Validation loss = 0.0011265588691458106
Validation loss = 0.0010148732690140605
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000808407086879015
Validation loss = 0.000592483498621732
Validation loss = 0.000790740828961134
Validation loss = 0.0007356498972512782
Validation loss = 0.0006106211221776903
Validation loss = 0.0012053728569298983
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009350969339720905
Validation loss = 0.0006716744974255562
Validation loss = 0.0014712009578943253
Validation loss = 0.0008237861911766231
Validation loss = 0.0008211323875002563
Validation loss = 0.0008947403402999043
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.35    |
| Iteration     | 49       |
| MaximumReturn | -0.00142 |
| MinimumReturn | -48.7    |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009865592001006007
Validation loss = 0.0007029005791991949
Validation loss = 0.0008320718770846725
Validation loss = 0.0009351492626592517
Validation loss = 0.0007832337869331241
Validation loss = 0.0010426108492538333
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007736936677247286
Validation loss = 0.0007552125607617199
Validation loss = 0.0008388756541535258
Validation loss = 0.00069003039970994
Validation loss = 0.0004965935368090868
Validation loss = 0.0011509967735037208
Validation loss = 0.0006030128570273519
Validation loss = 0.0008238757727667689
Validation loss = 0.0010907069081440568
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000864013156387955
Validation loss = 0.0013630082830786705
Validation loss = 0.0008275556610897183
Validation loss = 0.0008569603087380528
Validation loss = 0.0007402111659757793
Validation loss = 0.0008478768868371844
Validation loss = 0.0011347837280482054
Validation loss = 0.0009394417284056544
Validation loss = 0.000637222605291754
Validation loss = 0.0007167800213210285
Validation loss = 0.0014316285960376263
Validation loss = 0.0008587026386521757
Validation loss = 0.000929682981222868
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009032958769239485
Validation loss = 0.0005599217838607728
Validation loss = 0.0007517790072597563
Validation loss = 0.0006035671103745699
Validation loss = 0.0008313946891576052
Validation loss = 0.0012803834397345781
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006976092699915171
Validation loss = 0.0007993856561370194
Validation loss = 0.0005993419326841831
Validation loss = 0.0009420539136044681
Validation loss = 0.0006459017749875784
Validation loss = 0.000756582070607692
Validation loss = 0.0011022166581824422
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -12       |
| Iteration     | 50        |
| MaximumReturn | -0.000558 |
| MinimumReturn | -93.3     |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001197749050334096
Validation loss = 0.0008702168706804514
Validation loss = 0.0007804383640177548
Validation loss = 0.002010749187320471
Validation loss = 0.0006711572059430182
Validation loss = 0.0006256844499148428
Validation loss = 0.0009193390724249184
Validation loss = 0.0005892481422051787
Validation loss = 0.0008956798701547086
Validation loss = 0.0008549250196665525
Validation loss = 0.0005809076246805489
Validation loss = 0.0007802659529261291
Validation loss = 0.0008546074386686087
Validation loss = 0.0010393017437309027
Validation loss = 0.000801808200776577
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012775476789101958
Validation loss = 0.0008273586281575263
Validation loss = 0.0009648853447288275
Validation loss = 0.0006098829326219857
Validation loss = 0.0009056616690941155
Validation loss = 0.0008259248570539057
Validation loss = 0.0007256055250763893
Validation loss = 0.0008145630708895624
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010008601238951087
Validation loss = 0.0008347575203515589
Validation loss = 0.0006420912686735392
Validation loss = 0.0009380767005495727
Validation loss = 0.0007260380079969764
Validation loss = 0.0007817386649549007
Validation loss = 0.0006380995619110763
Validation loss = 0.0006907280185259879
Validation loss = 0.0008992108050733805
Validation loss = 0.0006344080320559442
Validation loss = 0.0007585150888189673
Validation loss = 0.0014202413149178028
Validation loss = 0.0015785322757437825
Validation loss = 0.00075057044159621
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021834606304764748
Validation loss = 0.0007024069200269878
Validation loss = 0.0007708346238359809
Validation loss = 0.0006878920248709619
Validation loss = 0.00072412722511217
Validation loss = 0.0006697789649479091
Validation loss = 0.0008520318660885096
Validation loss = 0.0007590385503135622
Validation loss = 0.0007122123497538269
Validation loss = 0.0007794148405082524
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009998866589739919
Validation loss = 0.0009279630612581968
Validation loss = 0.0006677661440335214
Validation loss = 0.0012130433460697532
Validation loss = 0.0007484837551601231
Validation loss = 0.0007510258001275361
Validation loss = 0.0006062684115022421
Validation loss = 0.0007267097244039178
Validation loss = 0.00091782514937222
Validation loss = 0.0009059202275238931
Validation loss = 0.0016610199818387628
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -141     |
| Iteration     | 51       |
| MaximumReturn | -43.2    |
| MinimumReturn | -164     |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001890529994852841
Validation loss = 0.0007090657018125057
Validation loss = 0.0006989235407672822
Validation loss = 0.000668278313241899
Validation loss = 0.0007386776269413531
Validation loss = 0.0011863522231578827
Validation loss = 0.0009408911573700607
Validation loss = 0.001358817215077579
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014505217550322413
Validation loss = 0.0008237511501647532
Validation loss = 0.0005921892006881535
Validation loss = 0.0006619069608859718
Validation loss = 0.0005324603407643735
Validation loss = 0.001499502337537706
Validation loss = 0.0013721744762733579
Validation loss = 0.0007919910713098943
Validation loss = 0.0006219646893441677
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019955309107899666
Validation loss = 0.0014832495944574475
Validation loss = 0.0009494785335846245
Validation loss = 0.0006063798791728914
Validation loss = 0.0007038416224531829
Validation loss = 0.0008748494437895715
Validation loss = 0.001763415290042758
Validation loss = 0.0014211788075044751
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015764731215313077
Validation loss = 0.0007291276124306023
Validation loss = 0.0007306415936909616
Validation loss = 0.0010633100755512714
Validation loss = 0.0006979058962315321
Validation loss = 0.0007686173194088042
Validation loss = 0.0007651459891349077
Validation loss = 0.0006076312274672091
Validation loss = 0.0006893316167406738
Validation loss = 0.0007810594979673624
Validation loss = 0.0006851804791949689
Validation loss = 0.0012659463100135326
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002471484011039138
Validation loss = 0.0008240320021286607
Validation loss = 0.0006990939727984369
Validation loss = 0.0006516145076602697
Validation loss = 0.0006755898357369006
Validation loss = 0.0007876309100538492
Validation loss = 0.0011332646245136857
Validation loss = 0.0009044077596627176
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.447   |
| Iteration     | 52       |
| MaximumReturn | -0.187   |
| MinimumReturn | -0.888   |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006769449100829661
Validation loss = 0.0005810495349578559
Validation loss = 0.0005874462076462805
Validation loss = 0.0008193973917514086
Validation loss = 0.0009822869906201959
Validation loss = 0.0008805225952528417
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006378592806868255
Validation loss = 0.0006202685181051493
Validation loss = 0.0010913694277405739
Validation loss = 0.0005442464607767761
Validation loss = 0.0006001023575663567
Validation loss = 0.0009278319193981588
Validation loss = 0.0007792214746586978
Validation loss = 0.000860647123772651
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000716830138117075
Validation loss = 0.0009795286459848285
Validation loss = 0.0008552339277230203
Validation loss = 0.0009022062295116484
Validation loss = 0.0008668912923894823
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000878754653967917
Validation loss = 0.000491988321300596
Validation loss = 0.0006242422969080508
Validation loss = 0.0011207758216187358
Validation loss = 0.0005511058843694627
Validation loss = 0.0008134299423545599
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007641798001714051
Validation loss = 0.0007282961742021143
Validation loss = 0.0007302007870748639
Validation loss = 0.0007655689842067659
Validation loss = 0.0007060950738377869
Validation loss = 0.0006055980338715017
Validation loss = 0.0017435887129977345
Validation loss = 0.0008464179118163884
Validation loss = 0.0007248322363011539
Validation loss = 0.0026291601825505495
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0522   |
| Iteration     | 53        |
| MaximumReturn | -0.000727 |
| MinimumReturn | -0.844    |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000883194210473448
Validation loss = 0.001698095235042274
Validation loss = 0.0007168129668571055
Validation loss = 0.0008634334662929177
Validation loss = 0.0007628191960975528
Validation loss = 0.0006811589119024575
Validation loss = 0.0008780536008998752
Validation loss = 0.0007184047717601061
Validation loss = 0.0005901665426790714
Validation loss = 0.000760394730605185
Validation loss = 0.0010456574382260442
Validation loss = 0.0007195398211479187
Validation loss = 0.0007590331952087581
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006037350976839662
Validation loss = 0.0006455212715081871
Validation loss = 0.0007204050780273974
Validation loss = 0.000933481496758759
Validation loss = 0.0007760641747154295
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007128301658667624
Validation loss = 0.0011690928367897868
Validation loss = 0.0018232079455628991
Validation loss = 0.0008699417230673134
Validation loss = 0.0010985875269398093
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005850000306963921
Validation loss = 0.000872987206093967
Validation loss = 0.0007357156719081104
Validation loss = 0.0014428563881665468
Validation loss = 0.0007956039626151323
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009979793103411794
Validation loss = 0.0006321749533526599
Validation loss = 0.0010803809855133295
Validation loss = 0.0009595573646947742
Validation loss = 0.0006101048784330487
Validation loss = 0.0006413940573111176
Validation loss = 0.0021893505472689867
Validation loss = 0.0014417034108191729
Validation loss = 0.0007395134889520705
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.125   |
| Iteration     | 54       |
| MaximumReturn | -0.00142 |
| MinimumReturn | -0.305   |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005932748317718506
Validation loss = 0.0006984927458688617
Validation loss = 0.0009765669237822294
Validation loss = 0.0006909521762281656
Validation loss = 0.0006035210099071264
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008526653400622308
Validation loss = 0.0007863491191528738
Validation loss = 0.0008250162354670465
Validation loss = 0.0008277159067802131
Validation loss = 0.0012214124435558915
Validation loss = 0.0013552267337217927
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006156930467113853
Validation loss = 0.0007857769960537553
Validation loss = 0.0007621606928296387
Validation loss = 0.0007861651829443872
Validation loss = 0.0007299395510926843
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007128011202439666
Validation loss = 0.0008354677120223641
Validation loss = 0.002053529256954789
Validation loss = 0.0010074249003082514
Validation loss = 0.0009578619501553476
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010068435221910477
Validation loss = 0.0006377363461069763
Validation loss = 0.001064652344211936
Validation loss = 0.0009687813580967486
Validation loss = 0.0005885843420401216
Validation loss = 0.0005405161646194756
Validation loss = 0.0006733609479852021
Validation loss = 0.0007205542060546577
Validation loss = 0.0006933665135875344
Validation loss = 0.0005574831739068031
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00165  |
| Iteration     | 55        |
| MaximumReturn | -0.000646 |
| MinimumReturn | -0.022    |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018378719687461853
Validation loss = 0.0005804221145808697
Validation loss = 0.0008742682985030115
Validation loss = 0.0009330769535154104
Validation loss = 0.00109486177098006
Validation loss = 0.0007514798198826611
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000751060142647475
Validation loss = 0.0007986358250491321
Validation loss = 0.0008627169881947339
Validation loss = 0.0006238698842935264
Validation loss = 0.0011999772395938635
Validation loss = 0.000782070797868073
Validation loss = 0.0022046086378395557
Validation loss = 0.0006006365292705595
Validation loss = 0.0009427324985153973
Validation loss = 0.0005959315458312631
Validation loss = 0.0006179416668601334
Validation loss = 0.0009581902413628995
Validation loss = 0.0006970899412408471
Validation loss = 0.0006414350937120616
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007851719856262207
Validation loss = 0.0007373776752501726
Validation loss = 0.0011077079689130187
Validation loss = 0.0009525766363367438
Validation loss = 0.000821799214463681
Validation loss = 0.0006620237836614251
Validation loss = 0.0007832364062778652
Validation loss = 0.0009895670227706432
Validation loss = 0.001032516942359507
Validation loss = 0.0007161310059018433
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010883522918447852
Validation loss = 0.0007732089143246412
Validation loss = 0.0005630705854855478
Validation loss = 0.0008555824169889092
Validation loss = 0.0007178594823926687
Validation loss = 0.0006145566585473716
Validation loss = 0.0006878937711007893
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006194004672579467
Validation loss = 0.0005389744765125215
Validation loss = 0.0006387622561305761
Validation loss = 0.0009364603902213275
Validation loss = 0.0008021810790523887
Validation loss = 0.0009286640561185777
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -82.9    |
| Iteration     | 56       |
| MaximumReturn | -8.36    |
| MinimumReturn | -149     |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009679128415882587
Validation loss = 0.0008114486117847264
Validation loss = 0.0011194635881111026
Validation loss = 0.0007499288767576218
Validation loss = 0.001494850032031536
Validation loss = 0.0007634884095750749
Validation loss = 0.0007394065614789724
Validation loss = 0.001586711616255343
Validation loss = 0.000904330110643059
Validation loss = 0.0009414288215339184
Validation loss = 0.0007050437852740288
Validation loss = 0.0012923384783789515
Validation loss = 0.0008624306065030396
Validation loss = 0.0007052312721498311
Validation loss = 0.0005549909546971321
Validation loss = 0.0009505290072411299
Validation loss = 0.0007883547223173082
Validation loss = 0.0011419272050261497
Validation loss = 0.0007911813445389271
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011459802044555545
Validation loss = 0.0010614510392770171
Validation loss = 0.0006435798131860793
Validation loss = 0.0008569771307520568
Validation loss = 0.0006977532175369561
Validation loss = 0.0028535842429846525
Validation loss = 0.0008613725658506155
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012197195319458842
Validation loss = 0.0006994986906647682
Validation loss = 0.0008992189541459084
Validation loss = 0.001165939960628748
Validation loss = 0.0008834567852318287
Validation loss = 0.0008904954884201288
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0030661162454634905
Validation loss = 0.0008746499079279602
Validation loss = 0.0012180997291579843
Validation loss = 0.0009425362222827971
Validation loss = 0.0010456694290041924
Validation loss = 0.001159228733740747
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013778498396277428
Validation loss = 0.0006727621075697243
Validation loss = 0.0008464099373668432
Validation loss = 0.0008794143795967102
Validation loss = 0.0005825720145367086
Validation loss = 0.0008906928705982864
Validation loss = 0.0006039197323843837
Validation loss = 0.0008149752975441515
Validation loss = 0.000790501304436475
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -78.8    |
| Iteration     | 57       |
| MaximumReturn | -0.229   |
| MinimumReturn | -147     |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007613464840687811
Validation loss = 0.0008474019705317914
Validation loss = 0.0010152037721127272
Validation loss = 0.0008659514714963734
Validation loss = 0.0006797905080020428
Validation loss = 0.0005676326691173017
Validation loss = 0.000708091480191797
Validation loss = 0.001165158348158002
Validation loss = 0.001312247826717794
Validation loss = 0.0010516828624531627
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0015271237352862954
Validation loss = 0.0008323268848471344
Validation loss = 0.0007214444922283292
Validation loss = 0.0020554664079099894
Validation loss = 0.0029705974739044905
Validation loss = 0.0008651938987895846
Validation loss = 0.0012395005906000733
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0014057870721444488
Validation loss = 0.0014965798472985625
Validation loss = 0.0012708191061392426
Validation loss = 0.0007706083706580102
Validation loss = 0.0017831807490438223
Validation loss = 0.0008031023317016661
Validation loss = 0.0007002406055107713
Validation loss = 0.0012360484106466174
Validation loss = 0.0010540202492848039
Validation loss = 0.0006982948980294168
Validation loss = 0.0011905708815902472
Validation loss = 0.0012096120044589043
Validation loss = 0.0010980605147778988
Validation loss = 0.0011796008329838514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009528365917503834
Validation loss = 0.0009219559724442661
Validation loss = 0.001133724465034902
Validation loss = 0.0018028683261945844
Validation loss = 0.0010617998195812106
Validation loss = 0.0006884470349177718
Validation loss = 0.0008357705664820969
Validation loss = 0.000997543684206903
Validation loss = 0.005238773766905069
Validation loss = 0.0006138230673968792
Validation loss = 0.000943790131714195
Validation loss = 0.000598315556999296
Validation loss = 0.0007794139091856778
Validation loss = 0.0015353040071204305
Validation loss = 0.0009268929716199636
Validation loss = 0.0007705423631705344
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013496028259396553
Validation loss = 0.0009714158368296921
Validation loss = 0.0007552255992777646
Validation loss = 0.0007378091686405241
Validation loss = 0.0009999785106629133
Validation loss = 0.0008129675989039242
Validation loss = 0.0008621516171842813
Validation loss = 0.0007885427912697196
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0907  |
| Iteration     | 58       |
| MaximumReturn | -0.0565  |
| MinimumReturn | -0.155   |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010189475724473596
Validation loss = 0.0007574482006020844
Validation loss = 0.0007352626998908818
Validation loss = 0.0009549004025757313
Validation loss = 0.0008353564189746976
Validation loss = 0.0007294485694728792
Validation loss = 0.0006987492670305073
Validation loss = 0.0006376741803251207
Validation loss = 0.0007321052253246307
Validation loss = 0.0008867305587045848
Validation loss = 0.0007153890328481793
Validation loss = 0.0006757415249012411
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009749079472385347
Validation loss = 0.0006497250287793577
Validation loss = 0.0007706392789259553
Validation loss = 0.0007445232477039099
Validation loss = 0.0015860033454373479
Validation loss = 0.0009096942958422005
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008403906831517816
Validation loss = 0.0008432168979197741
Validation loss = 0.0010561529779806733
Validation loss = 0.0007756833801977336
Validation loss = 0.0012479317374527454
Validation loss = 0.00107003899756819
Validation loss = 0.0008921005646698177
Validation loss = 0.0006116996519267559
Validation loss = 0.0007680556736886501
Validation loss = 0.002107031410560012
Validation loss = 0.0007392471306957304
Validation loss = 0.0009643391240388155
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009270646260119975
Validation loss = 0.0006931089446879923
Validation loss = 0.0006594149162992835
Validation loss = 0.0006341902771964669
Validation loss = 0.0007096850895322859
Validation loss = 0.0012219073250889778
Validation loss = 0.000678489450365305
Validation loss = 0.0007378132431767881
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008216456044465303
Validation loss = 0.0013345034094527364
Validation loss = 0.0008536757668480277
Validation loss = 0.0006891730590723455
Validation loss = 0.000914035364985466
Validation loss = 0.0007048987317830324
Validation loss = 0.0006762259290553629
Validation loss = 0.0010319995926693082
Validation loss = 0.0008225297788158059
Validation loss = 0.0007860086625441909
Validation loss = 0.0006689471774734557
Validation loss = 0.0006612654542550445
Validation loss = 0.0010878263274207711
Validation loss = 0.0007311259978450835
Validation loss = 0.0007493228185921907
Validation loss = 0.0007964082178659737
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.146   |
| Iteration     | 59       |
| MaximumReturn | -0.0811  |
| MinimumReturn | -0.194   |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006039974978193641
Validation loss = 0.0009838339174166322
Validation loss = 0.0009255826589651406
Validation loss = 0.0008460648241452873
Validation loss = 0.0009695128537714481
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010162955150008202
Validation loss = 0.0008442843682132661
Validation loss = 0.0007639087270945311
Validation loss = 0.0007166240247897804
Validation loss = 0.0008380290819332004
Validation loss = 0.0006803385331295431
Validation loss = 0.0009511638199910522
Validation loss = 0.0006320567335933447
Validation loss = 0.0008068340248428285
Validation loss = 0.0011137222172692418
Validation loss = 0.0007704652962274849
Validation loss = 0.0009003356099128723
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008677227888256311
Validation loss = 0.0008054656791500747
Validation loss = 0.0008260761387646198
Validation loss = 0.0007967607816681266
Validation loss = 0.0007448645192198455
Validation loss = 0.000884878623764962
Validation loss = 0.0007690784987062216
Validation loss = 0.000758492446038872
Validation loss = 0.0007615996291860938
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006549678510054946
Validation loss = 0.000837027037050575
Validation loss = 0.0012771800393238664
Validation loss = 0.0008269297541119158
Validation loss = 0.0007086646510288119
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005843592807650566
Validation loss = 0.0008789311395958066
Validation loss = 0.0008891531615518034
Validation loss = 0.000592998054344207
Validation loss = 0.0008875085623003542
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0551  |
| Iteration     | 60       |
| MaximumReturn | -0.00162 |
| MinimumReturn | -0.0788  |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000864518980961293
Validation loss = 0.0005695934523828328
Validation loss = 0.0005941367708146572
Validation loss = 0.003329681698232889
Validation loss = 0.0008437979267910123
Validation loss = 0.0018531716195866466
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006913565448485315
Validation loss = 0.002485515782609582
Validation loss = 0.0006577635649591684
Validation loss = 0.0006769192987121642
Validation loss = 0.0011255491990596056
Validation loss = 0.0005970196216367185
Validation loss = 0.0008716423180885613
Validation loss = 0.0014276755973696709
Validation loss = 0.0006004238966852427
Validation loss = 0.00100750382989645
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007486979011446238
Validation loss = 0.0009071110398508608
Validation loss = 0.0009069085353985429
Validation loss = 0.0009418404661118984
Validation loss = 0.0009610566194169223
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009694828186184168
Validation loss = 0.0006336537189781666
Validation loss = 0.0007745434995740652
Validation loss = 0.0009389499318785965
Validation loss = 0.0007609821623191237
Validation loss = 0.0010959436185657978
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005368465790525079
Validation loss = 0.0006986125954426825
Validation loss = 0.0006740836543031037
Validation loss = 0.0007190723554231226
Validation loss = 0.0005816868506371975
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.4    |
| Iteration     | 61       |
| MaximumReturn | -0.984   |
| MinimumReturn | -46      |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012699993094429374
Validation loss = 0.0008320516790263355
Validation loss = 0.0006405273452401161
Validation loss = 0.0008739673648960888
Validation loss = 0.0006010679062455893
Validation loss = 0.0016596970381215215
Validation loss = 0.0009081639582291245
Validation loss = 0.0015344534767791629
Validation loss = 0.0006996209267526865
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000745684839785099
Validation loss = 0.0006990254623815417
Validation loss = 0.000631186063401401
Validation loss = 0.0006585913361050189
Validation loss = 0.001077380613423884
Validation loss = 0.0008894071797840297
Validation loss = 0.000735752226319164
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007128960569389164
Validation loss = 0.0008526224410161376
Validation loss = 0.0006795012741349638
Validation loss = 0.0009307321161031723
Validation loss = 0.0008537943358533084
Validation loss = 0.0006253408500924706
Validation loss = 0.000689375854562968
Validation loss = 0.0007144128903746605
Validation loss = 0.0005722885834984481
Validation loss = 0.0009149598190560937
Validation loss = 0.0008020090172067285
Validation loss = 0.0009388505714014173
Validation loss = 0.0006875033723190427
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007190245669335127
Validation loss = 0.0010649185860529542
Validation loss = 0.0009401909192092717
Validation loss = 0.0005457134102471173
Validation loss = 0.000971933186519891
Validation loss = 0.0008383002132177353
Validation loss = 0.000740421237424016
Validation loss = 0.0006262533715926111
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006129768444225192
Validation loss = 0.0006780271069146693
Validation loss = 0.0007539452635683119
Validation loss = 0.0014562017749994993
Validation loss = 0.0007650869665667415
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -87.2    |
| Iteration     | 62       |
| MaximumReturn | -63      |
| MinimumReturn | -110     |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015719409566372633
Validation loss = 0.0007628172752447426
Validation loss = 0.0007083189557306468
Validation loss = 0.0006364913424476981
Validation loss = 0.0008507196907885373
Validation loss = 0.000723072502296418
Validation loss = 0.000994182424619794
Validation loss = 0.0016488536493852735
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007465272792614996
Validation loss = 0.0006089237285777926
Validation loss = 0.0007692515500821173
Validation loss = 0.0007954547181725502
Validation loss = 0.0008307097014039755
Validation loss = 0.0007467109826393425
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018969415687024593
Validation loss = 0.0005834096227772534
Validation loss = 0.0007244537118822336
Validation loss = 0.0013202656991779804
Validation loss = 0.0007495771278627217
Validation loss = 0.001089770463295281
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008513688226230443
Validation loss = 0.0011684122728183866
Validation loss = 0.0008737918687984347
Validation loss = 0.0012551186373457313
Validation loss = 0.0008528917678631842
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011832664022222161
Validation loss = 0.00098025007173419
Validation loss = 0.0008901174878701568
Validation loss = 0.0011454650666564703
Validation loss = 0.0009529677336104214
Validation loss = 0.0007747975178062916
Validation loss = 0.000681176024954766
Validation loss = 0.0006107448716647923
Validation loss = 0.0005807491252198815
Validation loss = 0.0010293259983882308
Validation loss = 0.0008004094706848264
Validation loss = 0.0009252646705135703
Validation loss = 0.0009345434955321252
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -136     |
| Iteration     | 63       |
| MaximumReturn | -97.3    |
| MinimumReturn | -156     |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011251744581386447
Validation loss = 0.0007963408133946359
Validation loss = 0.001443625777028501
Validation loss = 0.0006930890376679599
Validation loss = 0.0007752720848657191
Validation loss = 0.0014488468877971172
Validation loss = 0.0007743331952951849
Validation loss = 0.0007177169900387526
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006860207067802548
Validation loss = 0.0008602951420471072
Validation loss = 0.0008731081616133451
Validation loss = 0.0006796527886763215
Validation loss = 0.0008287876262329519
Validation loss = 0.0009742742986418307
Validation loss = 0.0007793811382725835
Validation loss = 0.0010124362306669354
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013498356565833092
Validation loss = 0.0014304772485047579
Validation loss = 0.0006983097991906106
Validation loss = 0.0006340396939776838
Validation loss = 0.0006026661721989512
Validation loss = 0.0012467422056943178
Validation loss = 0.00107173144351691
Validation loss = 0.0008095592493191361
Validation loss = 0.0013476228341460228
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001268421416170895
Validation loss = 0.0010539465583860874
Validation loss = 0.001109106233343482
Validation loss = 0.0009275238844566047
Validation loss = 0.0014814116293564439
Validation loss = 0.0008376256446354091
Validation loss = 0.0009851313661783934
Validation loss = 0.0014185101026669145
Validation loss = 0.0014063281705603004
Validation loss = 0.0008262473857030272
Validation loss = 0.0008340515778400004
Validation loss = 0.000969288928899914
Validation loss = 0.0008297100430354476
Validation loss = 0.0012596414890140295
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000846800219733268
Validation loss = 0.0006944301421754062
Validation loss = 0.0007118888315744698
Validation loss = 0.0008459570817649364
Validation loss = 0.0008407901623286307
Validation loss = 0.0007438805769197643
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -143     |
| Iteration     | 64       |
| MaximumReturn | -121     |
| MinimumReturn | -155     |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008740948396734893
Validation loss = 0.0006809424958191812
Validation loss = 0.000681089295540005
Validation loss = 0.001632894272916019
Validation loss = 0.000983661855570972
Validation loss = 0.000783371040597558
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007660824339836836
Validation loss = 0.0008572936058044434
Validation loss = 0.0008091317722573876
Validation loss = 0.0013933199224993587
Validation loss = 0.0013056941097602248
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010239932453259826
Validation loss = 0.0007554814801551402
Validation loss = 0.0008153650560416281
Validation loss = 0.0007725526229478419
Validation loss = 0.0012319280067458749
Validation loss = 0.001011884887702763
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009801066480576992
Validation loss = 0.001208477420732379
Validation loss = 0.000685952021740377
Validation loss = 0.0015442796284332871
Validation loss = 0.0008242530166171491
Validation loss = 0.0009351802873425186
Validation loss = 0.0006896728882566094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0014647949719801545
Validation loss = 0.0008263564086519182
Validation loss = 0.0007239438127726316
Validation loss = 0.0006136356387287378
Validation loss = 0.0007305611507035792
Validation loss = 0.0008486956940032542
Validation loss = 0.0007212901255115867
Validation loss = 0.0007971766171976924
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -107     |
| Iteration     | 65       |
| MaximumReturn | -0.00325 |
| MinimumReturn | -168     |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007540534716099501
Validation loss = 0.0009058067807927728
Validation loss = 0.0006507664802484214
Validation loss = 0.0006667408160865307
Validation loss = 0.0009973690612241626
Validation loss = 0.0005903812707401812
Validation loss = 0.0007417701999656856
Validation loss = 0.0007155330386012793
Validation loss = 0.0007177342777140439
Validation loss = 0.0008494113571941853
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007457311730831861
Validation loss = 0.0006920677842572331
Validation loss = 0.0009512407123111188
Validation loss = 0.0005832322058267891
Validation loss = 0.0006729760207235813
Validation loss = 0.0006339451065286994
Validation loss = 0.0008986468310467899
Validation loss = 0.0009503713808953762
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011721791233867407
Validation loss = 0.0007099338108673692
Validation loss = 0.0007904660888016224
Validation loss = 0.0006940502789802849
Validation loss = 0.0008820928051136434
Validation loss = 0.0006214565364643931
Validation loss = 0.0008074752404354513
Validation loss = 0.0006620383937843144
Validation loss = 0.0008431394235230982
Validation loss = 0.0006564254290424287
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011142018483951688
Validation loss = 0.0008921255939640105
Validation loss = 0.0008885872084647417
Validation loss = 0.0006952222902327776
Validation loss = 0.0014236083952710032
Validation loss = 0.000698629068210721
Validation loss = 0.0012568573001772165
Validation loss = 0.0006699705845676363
Validation loss = 0.001030250801704824
Validation loss = 0.0007446440286003053
Validation loss = 0.0016164622502401471
Validation loss = 0.0013308634515851736
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007308041676878929
Validation loss = 0.0006913773831911385
Validation loss = 0.0006770896725356579
Validation loss = 0.0006610326236113906
Validation loss = 0.000735802692361176
Validation loss = 0.000780454371124506
Validation loss = 0.0005430005257949233
Validation loss = 0.0007098531350493431
Validation loss = 0.0007104158285073936
Validation loss = 0.0007129065925255418
Validation loss = 0.0011062801349908113
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36      |
| Iteration     | 66       |
| MaximumReturn | -0.00125 |
| MinimumReturn | -115     |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010736207477748394
Validation loss = 0.0011242578038945794
Validation loss = 0.0011285775108262897
Validation loss = 0.001144355395808816
Validation loss = 0.0009739657980389893
Validation loss = 0.0005348992417566478
Validation loss = 0.0008576074615120888
Validation loss = 0.0006268630968406796
Validation loss = 0.0005541182472370565
Validation loss = 0.001138554303906858
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008603958995081484
Validation loss = 0.0014618138084188104
Validation loss = 0.0017136686947196722
Validation loss = 0.001700335880741477
Validation loss = 0.0006422219448722899
Validation loss = 0.000841956993099302
Validation loss = 0.0006474711117334664
Validation loss = 0.0006462729652412236
Validation loss = 0.0007955000037327409
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0018859549891203642
Validation loss = 0.0008791782311163843
Validation loss = 0.0009101768955588341
Validation loss = 0.0012356770457699895
Validation loss = 0.0007034465088509023
Validation loss = 0.0008032452897168696
Validation loss = 0.000684268306940794
Validation loss = 0.0007638817769475281
Validation loss = 0.001574841095134616
Validation loss = 0.0019027764210477471
Validation loss = 0.0012364656431600451
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008065220317803323
Validation loss = 0.0006935003329999745
Validation loss = 0.0006815511151216924
Validation loss = 0.0007047556573525071
Validation loss = 0.0012736684875562787
Validation loss = 0.0014467643341049552
Validation loss = 0.001537712407298386
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009707295685075223
Validation loss = 0.0006926976493559778
Validation loss = 0.0018152387347072363
Validation loss = 0.0008046436123549938
Validation loss = 0.0007615644717589021
Validation loss = 0.0016733929514884949
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -109     |
| Iteration     | 67       |
| MaximumReturn | -0.00197 |
| MinimumReturn | -164     |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001105842529796064
Validation loss = 0.0007492650765925646
Validation loss = 0.0008086012094281614
Validation loss = 0.0007656295201741159
Validation loss = 0.0017245826311409473
Validation loss = 0.0009754141792654991
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008947711321525276
Validation loss = 0.0009601693018339574
Validation loss = 0.0008197352872230113
Validation loss = 0.0009746484574861825
Validation loss = 0.0009615530725568533
Validation loss = 0.0010927208932116628
Validation loss = 0.0008000251254998147
Validation loss = 0.0006481828168034554
Validation loss = 0.000984237645752728
Validation loss = 0.0011018251534551382
Validation loss = 0.0009880249854177237
Validation loss = 0.0009043000754900277
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009052542154677212
Validation loss = 0.0006309648742899299
Validation loss = 0.0006433760863728821
Validation loss = 0.002168545499444008
Validation loss = 0.0007050949498079717
Validation loss = 0.001377372071146965
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012336776126176119
Validation loss = 0.000763551564887166
Validation loss = 0.001061591086909175
Validation loss = 0.000873679353389889
Validation loss = 0.0007231800118461251
Validation loss = 0.0006544191855937243
Validation loss = 0.0006920465966686606
Validation loss = 0.003034569788724184
Validation loss = 0.0009885122999548912
Validation loss = 0.0007555988267995417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007051091524772346
Validation loss = 0.0008168517961166799
Validation loss = 0.0010521026561036706
Validation loss = 0.0009481977322138846
Validation loss = 0.000724334386177361
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -42      |
| Iteration     | 68       |
| MaximumReturn | -0.0632  |
| MinimumReturn | -84.3    |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008578184060752392
Validation loss = 0.0007282251608557999
Validation loss = 0.000832940626423806
Validation loss = 0.0007166898576542735
Validation loss = 0.0013726480538025498
Validation loss = 0.0007207032758742571
Validation loss = 0.0010985326953232288
Validation loss = 0.0010500274365767837
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007125890115275979
Validation loss = 0.0012302695540711284
Validation loss = 0.0009368009632453322
Validation loss = 0.000695147376973182
Validation loss = 0.0010312964441254735
Validation loss = 0.0007747227209620178
Validation loss = 0.0008390072034671903
Validation loss = 0.0011911377077922225
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007119589135982096
Validation loss = 0.0007557460339739919
Validation loss = 0.000698321033269167
Validation loss = 0.000685321050696075
Validation loss = 0.0007551572052761912
Validation loss = 0.0015361123951151967
Validation loss = 0.0023292226251214743
Validation loss = 0.0013710686471313238
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007068448467180133
Validation loss = 0.0011602165177464485
Validation loss = 0.0015258761122822762
Validation loss = 0.0007777866558171809
Validation loss = 0.0023940622340887785
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010611077304929495
Validation loss = 0.0009339938405901194
Validation loss = 0.001252495450899005
Validation loss = 0.0007167779840528965
Validation loss = 0.000713301938958466
Validation loss = 0.00108501932118088
Validation loss = 0.0007291255169548094
Validation loss = 0.000856830389238894
Validation loss = 0.0009347088634967804
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -129     |
| Iteration     | 69       |
| MaximumReturn | -48.8    |
| MinimumReturn | -147     |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008414582116529346
Validation loss = 0.0007056438480503857
Validation loss = 0.0015904514584690332
Validation loss = 0.0007154662744142115
Validation loss = 0.0007420170586556196
Validation loss = 0.0011549401096999645
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008234927081502974
Validation loss = 0.0006557521410286427
Validation loss = 0.0006206120597198606
Validation loss = 0.0012273992178961635
Validation loss = 0.0007047218969091773
Validation loss = 0.0008248446392826736
Validation loss = 0.0007033412693999708
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007553668692708015
Validation loss = 0.0007031204295344651
Validation loss = 0.0008522559655830264
Validation loss = 0.0007123217219486833
Validation loss = 0.0014473049668595195
Validation loss = 0.0006094382260926068
Validation loss = 0.000759065558668226
Validation loss = 0.0006229156861081719
Validation loss = 0.0008433428592979908
Validation loss = 0.0007763226167298853
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000779150752350688
Validation loss = 0.000668044260237366
Validation loss = 0.0006661263760179281
Validation loss = 0.0007799132727086544
Validation loss = 0.000760041584726423
Validation loss = 0.0010555832413956523
Validation loss = 0.001244372222572565
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00091210613027215
Validation loss = 0.0014624494360759854
Validation loss = 0.001252023852430284
Validation loss = 0.0007990638841874897
Validation loss = 0.0007110289297997952
Validation loss = 0.0011231156531721354
Validation loss = 0.0009159296751022339
Validation loss = 0.0008487175800837576
Validation loss = 0.0007717472617514431
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -105     |
| Iteration     | 70       |
| MaximumReturn | -46.1    |
| MinimumReturn | -135     |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007624738500453532
Validation loss = 0.0012386497110128403
Validation loss = 0.0006784393917769194
Validation loss = 0.0006596288294531405
Validation loss = 0.0007060820935294032
Validation loss = 0.0013715628301724792
Validation loss = 0.0008564548334106803
Validation loss = 0.0007286051986739039
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008199786534532905
Validation loss = 0.0009246013360098004
Validation loss = 0.0007468730909749866
Validation loss = 0.0007371602696366608
Validation loss = 0.0009214953170157969
Validation loss = 0.0007297745323739946
Validation loss = 0.0006273671751841903
Validation loss = 0.0007393565610982478
Validation loss = 0.0007794637349434197
Validation loss = 0.000841984641738236
Validation loss = 0.0008720209589228034
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006810591439716518
Validation loss = 0.0014490059111267328
Validation loss = 0.001418690662831068
Validation loss = 0.000869765121024102
Validation loss = 0.000777391018345952
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006432546651922166
Validation loss = 0.0007427611271850765
Validation loss = 0.000672117923386395
Validation loss = 0.0010075875325128436
Validation loss = 0.0007555647171102464
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008043568232096732
Validation loss = 0.0008999991696327925
Validation loss = 0.0008718137978576124
Validation loss = 0.00080101378262043
Validation loss = 0.0006999138277024031
Validation loss = 0.0007431851699948311
Validation loss = 0.0010396727593615651
Validation loss = 0.0006132060661911964
Validation loss = 0.0007908459519967437
Validation loss = 0.0010101853404194117
Validation loss = 0.0008793329470790923
Validation loss = 0.0008499748655594885
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -80.5    |
| Iteration     | 71       |
| MaximumReturn | -53.9    |
| MinimumReturn | -109     |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008716435986571014
Validation loss = 0.0008483479032292962
Validation loss = 0.0007911519496701658
Validation loss = 0.0006290245801210403
Validation loss = 0.0005485606961883605
Validation loss = 0.0009312340407632291
Validation loss = 0.0009611174464225769
Validation loss = 0.000885192712303251
Validation loss = 0.0005910592153668404
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000770610524341464
Validation loss = 0.0006453635287471116
Validation loss = 0.0009159225737676024
Validation loss = 0.0011203543981537223
Validation loss = 0.0012604297371581197
Validation loss = 0.0006524960044771433
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013985787518322468
Validation loss = 0.0010816572466865182
Validation loss = 0.0008745241211727262
Validation loss = 0.0009427460609003901
Validation loss = 0.0008859848603606224
Validation loss = 0.0007013179711066186
Validation loss = 0.0006278250366449356
Validation loss = 0.0008047216688282788
Validation loss = 0.0006125152576714754
Validation loss = 0.0006100588361732662
Validation loss = 0.0013644496211782098
Validation loss = 0.0005432450561784208
Validation loss = 0.0008228107471950352
Validation loss = 0.0006657300982624292
Validation loss = 0.0006705372943542898
Validation loss = 0.0006352970376610756
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009302997495979071
Validation loss = 0.0007037547766231
Validation loss = 0.000760272319894284
Validation loss = 0.0010146467247977853
Validation loss = 0.0008197601418942213
Validation loss = 0.0007316073752008379
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009561850456520915
Validation loss = 0.0006456797127611935
Validation loss = 0.000706707825884223
Validation loss = 0.0006676257471553981
Validation loss = 0.0006066128262318671
Validation loss = 0.0006885697366669774
Validation loss = 0.0006167112151160836
Validation loss = 0.0006527739460580051
Validation loss = 0.0006524586933664978
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.861   |
| Iteration     | 72       |
| MaximumReturn | -0.0564  |
| MinimumReturn | -15.6    |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0011956937378272414
Validation loss = 0.0008241730974987149
Validation loss = 0.001104456139728427
Validation loss = 0.0006872527301311493
Validation loss = 0.000588703085668385
Validation loss = 0.0008957665995694697
Validation loss = 0.0007853864808566868
Validation loss = 0.0006548514356836677
Validation loss = 0.0006039395811967552
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007911997381597757
Validation loss = 0.0005609330837614834
Validation loss = 0.0006688549183309078
Validation loss = 0.0007337655988521874
Validation loss = 0.0013658065581694245
Validation loss = 0.0007143233087845147
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005939127877354622
Validation loss = 0.0013674066867679358
Validation loss = 0.0007213149801827967
Validation loss = 0.0009431915241293609
Validation loss = 0.0007329253130592406
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0012516997521743178
Validation loss = 0.0022352247033268213
Validation loss = 0.0009233785676769912
Validation loss = 0.0005843021790497005
Validation loss = 0.0008289077668450773
Validation loss = 0.0007226099260151386
Validation loss = 0.0006270048324950039
Validation loss = 0.0009687711135484278
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001058404566720128
Validation loss = 0.0009294890332967043
Validation loss = 0.000547328032553196
Validation loss = 0.001166943577118218
Validation loss = 0.001479542814195156
Validation loss = 0.0005838985089212656
Validation loss = 0.001046213787049055
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.254   |
| Iteration     | 73       |
| MaximumReturn | -0.024   |
| MinimumReturn | -3.71    |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006851620855741203
Validation loss = 0.0007773114484734833
Validation loss = 0.0007817099103704095
Validation loss = 0.0006951135001145303
Validation loss = 0.0005784868262708187
Validation loss = 0.000669723202008754
Validation loss = 0.0006673269672319293
Validation loss = 0.0006086339708417654
Validation loss = 0.0008022062829695642
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007512819720432162
Validation loss = 0.0007475604652427137
Validation loss = 0.0009992168052121997
Validation loss = 0.000879152212291956
Validation loss = 0.0009504769695922732
Validation loss = 0.0008484256686642766
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009122167248278856
Validation loss = 0.0009621775825507939
Validation loss = 0.0009604556835256517
Validation loss = 0.0008348762639798224
Validation loss = 0.0009855541866272688
Validation loss = 0.0009274104959331453
Validation loss = 0.0008556030225008726
Validation loss = 0.0007937244372442365
Validation loss = 0.0012593800202012062
Validation loss = 0.0009237173362635076
Validation loss = 0.0006642153020948172
Validation loss = 0.0008108841720968485
Validation loss = 0.0006540442118421197
Validation loss = 0.0007136277272365987
Validation loss = 0.0013742882292717695
Validation loss = 0.0005927346646785736
Validation loss = 0.0011113276705145836
Validation loss = 0.0006787314778193831
Validation loss = 0.0009323499398306012
Validation loss = 0.0007582030957564712
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000683463120367378
Validation loss = 0.0009425704483874142
Validation loss = 0.0013103297678753734
Validation loss = 0.000684733793605119
Validation loss = 0.000992214074358344
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010565208503976464
Validation loss = 0.0007175600621849298
Validation loss = 0.0006078730220906436
Validation loss = 0.0007482083747163415
Validation loss = 0.0008207514765672386
Validation loss = 0.0008457068470306695
Validation loss = 0.0006830315105617046
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.1    |
| Iteration     | 74       |
| MaximumReturn | -0.069   |
| MinimumReturn | -66.3    |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007155889179557562
Validation loss = 0.0006941467290744185
Validation loss = 0.001039362046867609
Validation loss = 0.0009726182324811816
Validation loss = 0.0005867793806828558
Validation loss = 0.000967453932389617
Validation loss = 0.0010679515544325113
Validation loss = 0.001683597220107913
Validation loss = 0.0006976491422392428
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00069902726681903
Validation loss = 0.0005781680229119956
Validation loss = 0.0007596844807267189
Validation loss = 0.0008518081740476191
Validation loss = 0.0009285644628107548
Validation loss = 0.0007462396752089262
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011232370743528008
Validation loss = 0.0006654399912804365
Validation loss = 0.0009245578548870981
Validation loss = 0.0008494866779074073
Validation loss = 0.0026977083180099726
Validation loss = 0.0007322428282350302
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000635370088275522
Validation loss = 0.0005695217987522483
Validation loss = 0.0012502454919740558
Validation loss = 0.0006208480917848647
Validation loss = 0.0008513922221027315
Validation loss = 0.0009167241514660418
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005881081451661885
Validation loss = 0.0008938860264606774
Validation loss = 0.0005515945376828313
Validation loss = 0.0005785905523225665
Validation loss = 0.0005401481175795197
Validation loss = 0.0009661103831604123
Validation loss = 0.0007005490479059517
Validation loss = 0.0008003881084732711
Validation loss = 0.0008758127223700285
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36.2    |
| Iteration     | 75       |
| MaximumReturn | -0.114   |
| MinimumReturn | -95.7    |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006223068339750171
Validation loss = 0.000652892398647964
Validation loss = 0.001123399706557393
Validation loss = 0.0008293410646729171
Validation loss = 0.0005478023085743189
Validation loss = 0.0007069791317917407
Validation loss = 0.0010342213790863752
Validation loss = 0.0008946553571149707
Validation loss = 0.0018494355026632547
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006779964314773679
Validation loss = 0.0005679657333530486
Validation loss = 0.0006213829619809985
Validation loss = 0.0007322157616727054
Validation loss = 0.0005341696669347584
Validation loss = 0.000719595467671752
Validation loss = 0.0008472780464217067
Validation loss = 0.0006155207520350814
Validation loss = 0.000670312496367842
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009059205185621977
Validation loss = 0.0009543283376842737
Validation loss = 0.0006049978546798229
Validation loss = 0.0007379843736998737
Validation loss = 0.0008108436595648527
Validation loss = 0.0009301918908022344
Validation loss = 0.0008367380360141397
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006564290961250663
Validation loss = 0.0007427837699651718
Validation loss = 0.000682282552588731
Validation loss = 0.0007154936902225018
Validation loss = 0.0021910821087658405
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010150033049285412
Validation loss = 0.0005534805823117495
Validation loss = 0.0005906620062887669
Validation loss = 0.0006823459407314658
Validation loss = 0.0007519840728491545
Validation loss = 0.0011991242645308375
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000879 |
| Iteration     | 76        |
| MaximumReturn | -0.000686 |
| MinimumReturn | -0.00105  |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008006255375221372
Validation loss = 0.0005592938978224993
Validation loss = 0.0006220710929483175
Validation loss = 0.0008736472227610648
Validation loss = 0.0006250374135561287
Validation loss = 0.0005799551145173609
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000930863490793854
Validation loss = 0.0006240562070161104
Validation loss = 0.000743420678190887
Validation loss = 0.0006679169018752873
Validation loss = 0.0007593018817715347
Validation loss = 0.0011534371878951788
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006565050571225584
Validation loss = 0.0008184260223060846
Validation loss = 0.0009059354197233915
Validation loss = 0.0008611827506683767
Validation loss = 0.0011541057610884309
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006147464737296104
Validation loss = 0.0006665171240456402
Validation loss = 0.0006947601796127856
Validation loss = 0.0005767061375081539
Validation loss = 0.0006736380746588111
Validation loss = 0.0006903492612764239
Validation loss = 0.0009518739534541965
Validation loss = 0.0007682667346671224
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006169442203827202
Validation loss = 0.0007989454315975308
Validation loss = 0.0006500010495074093
Validation loss = 0.0013340698787942529
Validation loss = 0.0008217067806981504
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.64    |
| Iteration     | 77       |
| MaximumReturn | -0.0187  |
| MinimumReturn | -28.9    |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009058553841896355
Validation loss = 0.0013105794787406921
Validation loss = 0.0011330308625474572
Validation loss = 0.0008387617417611182
Validation loss = 0.0012721794191747904
Validation loss = 0.0006424179300665855
Validation loss = 0.0010578273795545101
Validation loss = 0.0009194401209242642
Validation loss = 0.0008421726524829865
Validation loss = 0.0006741977413184941
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007560211233794689
Validation loss = 0.0006098879384808242
Validation loss = 0.0006884845206514001
Validation loss = 0.0006935379351489246
Validation loss = 0.0007272974471561611
Validation loss = 0.0006450066575780511
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007002484635449946
Validation loss = 0.0006815616507083178
Validation loss = 0.0007872076821513474
Validation loss = 0.0008113333024084568
Validation loss = 0.0006499739247374237
Validation loss = 0.0010464233346283436
Validation loss = 0.0007051451248116791
Validation loss = 0.0006230987492017448
Validation loss = 0.0010048785479739308
Validation loss = 0.0008424386032857001
Validation loss = 0.0007605936261825264
Validation loss = 0.0007572527974843979
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000732526124920696
Validation loss = 0.0005721546476706862
Validation loss = 0.0006743951817043126
Validation loss = 0.0009334587957710028
Validation loss = 0.0007440768531523645
Validation loss = 0.0008403891115449369
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000605918001383543
Validation loss = 0.0008156760013662279
Validation loss = 0.0006764213903807104
Validation loss = 0.000945374951697886
Validation loss = 0.000994431902654469
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00746  |
| Iteration     | 78        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.0863   |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001111743738874793
Validation loss = 0.0011713471030816436
Validation loss = 0.000645106250885874
Validation loss = 0.0006767127779312432
Validation loss = 0.0013536561746150255
Validation loss = 0.0006787095335312188
Validation loss = 0.0010486490791663527
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006552735576406121
Validation loss = 0.0006429995992220938
Validation loss = 0.0007433636346831918
Validation loss = 0.0011728164972737432
Validation loss = 0.00099663354922086
Validation loss = 0.00078774947905913
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005802937084808946
Validation loss = 0.0007691332721151412
Validation loss = 0.0009514284320175648
Validation loss = 0.0007182667031884193
Validation loss = 0.001256689429283142
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005777101032435894
Validation loss = 0.0006272313185036182
Validation loss = 0.000799449800979346
Validation loss = 0.0006876683328300714
Validation loss = 0.0007476673927158117
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007726606563664973
Validation loss = 0.0006324461428448558
Validation loss = 0.0006158366450108588
Validation loss = 0.0008174240938387811
Validation loss = 0.0014238087460398674
Validation loss = 0.000992290792055428
Validation loss = 0.0010582804679870605
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.32    |
| Iteration     | 79       |
| MaximumReturn | -0.0929  |
| MinimumReturn | -27.9    |
| TotalSamples  | 134946   |
----------------------------
