Logging to experiments/invertedPendulum/invertedPendulum/Mon-21-Nov-2022-03-21-48-PM-CST_invertedPendulum_trpo_iteration_20_seed2231
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7206536531448364
Validation loss = 0.3699313700199127
Validation loss = 0.3387269675731659
Validation loss = 0.29983311891555786
Validation loss = 0.2920011579990387
Validation loss = 0.2552431523799896
Validation loss = 0.24690480530261993
Validation loss = 0.23639677464962006
Validation loss = 0.22461336851119995
Validation loss = 0.20198792219161987
Validation loss = 0.19853916764259338
Validation loss = 0.19844640791416168
Validation loss = 0.21106183528900146
Validation loss = 0.217722088098526
Validation loss = 0.21362048387527466
Validation loss = 0.18529172241687775
Validation loss = 0.17374227941036224
Validation loss = 0.20473945140838623
Validation loss = 0.16270527243614197
Validation loss = 0.16634052991867065
Validation loss = 0.14973194897174835
Validation loss = 0.14720533788204193
Validation loss = 0.13575489819049835
Validation loss = 0.14118586480617523
Validation loss = 0.1189701035618782
Validation loss = 0.12452171742916107
Validation loss = 0.13307225704193115
Validation loss = 0.1336747109889984
Validation loss = 0.1291288137435913
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7124489545822144
Validation loss = 0.3654261529445648
Validation loss = 0.3742239475250244
Validation loss = 0.3149290084838867
Validation loss = 0.28759607672691345
Validation loss = 0.2797492742538452
Validation loss = 0.26011770963668823
Validation loss = 0.23485800623893738
Validation loss = 0.22093465924263
Validation loss = 0.21187947690486908
Validation loss = 0.1972196102142334
Validation loss = 0.18761441111564636
Validation loss = 0.1892855018377304
Validation loss = 0.1816251277923584
Validation loss = 0.18619440495967865
Validation loss = 0.182241290807724
Validation loss = 0.18301141262054443
Validation loss = 0.18065522611141205
Validation loss = 0.16331952810287476
Validation loss = 0.15802404284477234
Validation loss = 0.15899328887462616
Validation loss = 0.14081819355487823
Validation loss = 0.1361285150051117
Validation loss = 0.15131106972694397
Validation loss = 0.13412803411483765
Validation loss = 0.13357460498809814
Validation loss = 0.12948209047317505
Validation loss = 0.11363648623228073
Validation loss = 0.11279928684234619
Validation loss = 0.11846314370632172
Validation loss = 0.10602456331253052
Validation loss = 0.10632652789354324
Validation loss = 0.1040579080581665
Validation loss = 0.10839089751243591
Validation loss = 0.09716042131185532
Validation loss = 0.09449534863233566
Validation loss = 0.10625108331441879
Validation loss = 0.0973198413848877
Validation loss = 0.08693145215511322
Validation loss = 0.09440207481384277
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7100929021835327
Validation loss = 0.36353757977485657
Validation loss = 0.3643026351928711
Validation loss = 0.30966076254844666
Validation loss = 0.30629509687423706
Validation loss = 0.26898184418678284
Validation loss = 0.2641972601413727
Validation loss = 0.25170475244522095
Validation loss = 0.2187281847000122
Validation loss = 0.20530344545841217
Validation loss = 0.2331729531288147
Validation loss = 0.1949002593755722
Validation loss = 0.186078742146492
Validation loss = 0.18170569837093353
Validation loss = 0.16762034595012665
Validation loss = 0.17136019468307495
Validation loss = 0.1575908362865448
Validation loss = 0.15038098394870758
Validation loss = 0.1572783887386322
Validation loss = 0.15956172347068787
Validation loss = 0.14813093841075897
Validation loss = 0.14979931712150574
Validation loss = 0.13595208525657654
Validation loss = 0.12950529158115387
Validation loss = 0.11467260122299194
Validation loss = 0.11132397502660751
Validation loss = 0.10910575091838837
Validation loss = 0.11306650191545486
Validation loss = 0.09812893718481064
Validation loss = 0.09663884341716766
Validation loss = 0.1031365618109703
Validation loss = 0.10185673832893372
Validation loss = 0.10619626194238663
Validation loss = 0.11445052176713943
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7413776516914368
Validation loss = 0.3678656220436096
Validation loss = 0.36503735184669495
Validation loss = 0.30602869391441345
Validation loss = 0.3173811733722687
Validation loss = 0.2739686667919159
Validation loss = 0.23924607038497925
Validation loss = 0.23943093419075012
Validation loss = 0.21380670368671417
Validation loss = 0.20209987461566925
Validation loss = 0.19524718821048737
Validation loss = 0.18535685539245605
Validation loss = 0.18268917500972748
Validation loss = 0.17312397062778473
Validation loss = 0.17946559190750122
Validation loss = 0.16158190369606018
Validation loss = 0.1681850254535675
Validation loss = 0.145646870136261
Validation loss = 0.14702877402305603
Validation loss = 0.139130100607872
Validation loss = 0.14030680060386658
Validation loss = 0.14662738144397736
Validation loss = 0.1297108381986618
Validation loss = 0.1289207637310028
Validation loss = 0.13494567573070526
Validation loss = 0.13551223278045654
Validation loss = 0.11716403812170029
Validation loss = 0.11805148422718048
Validation loss = 0.10484568029642105
Validation loss = 0.09670653939247131
Validation loss = 0.10135051608085632
Validation loss = 0.11791160702705383
Validation loss = 0.0997437983751297
Validation loss = 0.0917668491601944
Validation loss = 0.086939238011837
Validation loss = 0.10029324144124985
Validation loss = 0.08412129431962967
Validation loss = 0.10150495171546936
Validation loss = 0.0807410255074501
Validation loss = 0.0749177560210228
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7131519913673401
Validation loss = 0.36128634214401245
Validation loss = 0.3611930310726166
Validation loss = 0.3136892318725586
Validation loss = 0.28803932666778564
Validation loss = 0.2601947784423828
Validation loss = 0.25544631481170654
Validation loss = 0.24104075133800507
Validation loss = 0.22215069830417633
Validation loss = 0.21122956275939941
Validation loss = 0.1949179321527481
Validation loss = 0.2102685123682022
Validation loss = 0.1935388743877411
Validation loss = 0.17755718529224396
Validation loss = 0.1798589825630188
Validation loss = 0.16594348847866058
Validation loss = 0.15904945135116577
Validation loss = 0.15504950284957886
Validation loss = 0.1536632627248764
Validation loss = 0.14260977506637573
Validation loss = 0.15575489401817322
Validation loss = 0.14735551178455353
Validation loss = 0.15442721545696259
Validation loss = 0.1407679319381714
Validation loss = 0.13147006928920746
Validation loss = 0.11799347400665283
Validation loss = 0.11682542413473129
Validation loss = 0.10626647621393204
Validation loss = 0.11055679619312286
Validation loss = 0.1062178835272789
Validation loss = 0.10862483829259872
Validation loss = 0.10220582783222198
Validation loss = 0.10228563845157623
Validation loss = 0.10005325078964233
Validation loss = 0.1017831489443779
Validation loss = 0.10563762485980988
Validation loss = 0.08671538531780243
Validation loss = 0.09774427860975266
Validation loss = 0.08520274609327316
Validation loss = 0.08536593616008759
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.255   |
| Iteration     | 0        |
| MaximumReturn | -0.0667  |
| MinimumReturn | -2.2     |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2831571102142334
Validation loss = 0.18117544054985046
Validation loss = 0.15179696679115295
Validation loss = 0.130479633808136
Validation loss = 0.12789544463157654
Validation loss = 0.1064334586262703
Validation loss = 0.09230508655309677
Validation loss = 0.08531900495290756
Validation loss = 0.08078442513942719
Validation loss = 0.08729853481054306
Validation loss = 0.1034703329205513
Validation loss = 0.0946715921163559
Validation loss = 0.07487431913614273
Validation loss = 0.07952462136745453
Validation loss = 0.07964793592691422
Validation loss = 0.07053219527006149
Validation loss = 0.0690184012055397
Validation loss = 0.06793316453695297
Validation loss = 0.06820791959762573
Validation loss = 0.06983964890241623
Validation loss = 0.07569378614425659
Validation loss = 0.0773896872997284
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.36331015825271606
Validation loss = 0.20961517095565796
Validation loss = 0.19050218164920807
Validation loss = 0.1768282800912857
Validation loss = 0.15185104310512543
Validation loss = 0.14086446166038513
Validation loss = 0.1305464208126068
Validation loss = 0.12966665625572205
Validation loss = 0.11118350923061371
Validation loss = 0.10271424800157547
Validation loss = 0.09845394641160965
Validation loss = 0.08937536925077438
Validation loss = 0.09391440451145172
Validation loss = 0.08239983767271042
Validation loss = 0.07653482258319855
Validation loss = 0.06852477788925171
Validation loss = 0.07697096467018127
Validation loss = 0.07212355732917786
Validation loss = 0.07390450686216354
Validation loss = 0.0657762959599495
Validation loss = 0.06875357776880264
Validation loss = 0.07366250455379486
Validation loss = 0.05933820456266403
Validation loss = 0.05373084917664528
Validation loss = 0.05559348315000534
Validation loss = 0.06064311042428017
Validation loss = 0.05692194402217865
Validation loss = 0.07108523696660995
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3460374176502228
Validation loss = 0.21452884376049042
Validation loss = 0.17740309238433838
Validation loss = 0.16943475604057312
Validation loss = 0.1436595618724823
Validation loss = 0.14156201481819153
Validation loss = 0.12437754124403
Validation loss = 0.12136463075876236
Validation loss = 0.1058347299695015
Validation loss = 0.09800316393375397
Validation loss = 0.0904586911201477
Validation loss = 0.09226056188344955
Validation loss = 0.07772485911846161
Validation loss = 0.07940680533647537
Validation loss = 0.07275819033384323
Validation loss = 0.0805269107222557
Validation loss = 0.07512705028057098
Validation loss = 0.08029329776763916
Validation loss = 0.06137707456946373
Validation loss = 0.059860244393348694
Validation loss = 0.06015331298112869
Validation loss = 0.06629346311092377
Validation loss = 0.05822823569178581
Validation loss = 0.06808625906705856
Validation loss = 0.06862641870975494
Validation loss = 0.07187353074550629
Validation loss = 0.057275231927633286
Validation loss = 0.0607047900557518
Validation loss = 0.058264072984457016
Validation loss = 0.05484743416309357
Validation loss = 0.06059432402253151
Validation loss = 0.061983224004507065
Validation loss = 0.06008266657590866
Validation loss = 0.054605886340141296
Validation loss = 0.07412225753068924
Validation loss = 0.05775788053870201
Validation loss = 0.04986531287431717
Validation loss = 0.06370072066783905
Validation loss = 0.04853738471865654
Validation loss = 0.05221635475754738
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.36679232120513916
Validation loss = 0.21091535687446594
Validation loss = 0.18594777584075928
Validation loss = 0.16752482950687408
Validation loss = 0.14917078614234924
Validation loss = 0.1311090737581253
Validation loss = 0.1206647977232933
Validation loss = 0.11983877420425415
Validation loss = 0.1032889112830162
Validation loss = 0.10201122611761093
Validation loss = 0.09996030479669571
Validation loss = 0.08596125990152359
Validation loss = 0.08348208665847778
Validation loss = 0.08342914283275604
Validation loss = 0.07734078168869019
Validation loss = 0.08264411240816116
Validation loss = 0.07089723646640778
Validation loss = 0.06492093205451965
Validation loss = 0.07399239391088486
Validation loss = 0.061949413269758224
Validation loss = 0.0686788409948349
Validation loss = 0.060050804167985916
Validation loss = 0.062491558492183685
Validation loss = 0.05891165882349014
Validation loss = 0.058319658041000366
Validation loss = 0.06277080625295639
Validation loss = 0.05938103795051575
Validation loss = 0.05539088323712349
Validation loss = 0.05391480028629303
Validation loss = 0.05505238473415375
Validation loss = 0.0629686787724495
Validation loss = 0.05879772827029228
Validation loss = 0.05724770948290825
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3723165988922119
Validation loss = 0.22890335321426392
Validation loss = 0.17453844845294952
Validation loss = 0.1522580236196518
Validation loss = 0.14395593106746674
Validation loss = 0.12523585557937622
Validation loss = 0.1238866001367569
Validation loss = 0.1145303025841713
Validation loss = 0.0992807149887085
Validation loss = 0.09355573356151581
Validation loss = 0.08753552287817001
Validation loss = 0.09061194956302643
Validation loss = 0.07965191453695297
Validation loss = 0.0798492506146431
Validation loss = 0.07587920874357224
Validation loss = 0.06753866374492645
Validation loss = 0.07966005802154541
Validation loss = 0.0669119656085968
Validation loss = 0.06709564477205276
Validation loss = 0.06431952863931656
Validation loss = 0.07811836898326874
Validation loss = 0.06307163089513779
Validation loss = 0.06569991260766983
Validation loss = 0.06318817287683487
Validation loss = 0.06114468723535538
Validation loss = 0.06614604592323303
Validation loss = 0.06289124488830566
Validation loss = 0.06135779619216919
Validation loss = 0.09767228364944458
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -42.5    |
| Iteration     | 1        |
| MaximumReturn | -3.32    |
| MinimumReturn | -64      |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2130117118358612
Validation loss = 0.0909358412027359
Validation loss = 0.05914848670363426
Validation loss = 0.053079456090927124
Validation loss = 0.04978160560131073
Validation loss = 0.04951851814985275
Validation loss = 0.052138250321149826
Validation loss = 0.03804522380232811
Validation loss = 0.0414273738861084
Validation loss = 0.032210852950811386
Validation loss = 0.04326122999191284
Validation loss = 0.04153458774089813
Validation loss = 0.0391836017370224
Validation loss = 0.03332356736063957
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.19191478192806244
Validation loss = 0.10551135241985321
Validation loss = 0.07100923359394073
Validation loss = 0.06584455817937851
Validation loss = 0.055137988179922104
Validation loss = 0.04867864027619362
Validation loss = 0.043600618839263916
Validation loss = 0.055392831563949585
Validation loss = 0.05838721618056297
Validation loss = 0.04326990246772766
Validation loss = 0.05412305146455765
Validation loss = 0.05455522984266281
Validation loss = 0.04056912660598755
Validation loss = 0.041110068559646606
Validation loss = 0.03684500232338905
Validation loss = 0.03611907362937927
Validation loss = 0.03658457100391388
Validation loss = 0.03715398907661438
Validation loss = 0.03323903679847717
Validation loss = 0.02905120700597763
Validation loss = 0.04668287932872772
Validation loss = 0.03568773716688156
Validation loss = 0.030362308025360107
Validation loss = 0.029021840542554855
Validation loss = 0.02467162348330021
Validation loss = 0.024937156587839127
Validation loss = 0.03340611979365349
Validation loss = 0.0369572788476944
Validation loss = 0.020257480442523956
Validation loss = 0.026317745447158813
Validation loss = 0.024545013904571533
Validation loss = 0.04355812445282936
Validation loss = 0.02726840227842331
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2565518915653229
Validation loss = 0.09773752093315125
Validation loss = 0.0745735764503479
Validation loss = 0.07147900760173798
Validation loss = 0.06160688400268555
Validation loss = 0.05772729963064194
Validation loss = 0.04706773906946182
Validation loss = 0.04848739504814148
Validation loss = 0.043512195348739624
Validation loss = 0.036259640008211136
Validation loss = 0.047218047082424164
Validation loss = 0.0436546616256237
Validation loss = 0.032909128814935684
Validation loss = 0.029650215059518814
Validation loss = 0.026468299329280853
Validation loss = 0.0315791592001915
Validation loss = 0.0430256724357605
Validation loss = 0.03580086678266525
Validation loss = 0.03462323918938637
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2175271064043045
Validation loss = 0.084185391664505
Validation loss = 0.05270591378211975
Validation loss = 0.04625684395432472
Validation loss = 0.0398092195391655
Validation loss = 0.05538200959563255
Validation loss = 0.04315530136227608
Validation loss = 0.04271747171878815
Validation loss = 0.0390838161110878
Validation loss = 0.04768826439976692
Validation loss = 0.04446576163172722
Validation loss = 0.03776440769433975
Validation loss = 0.029646746814250946
Validation loss = 0.0398530550301075
Validation loss = 0.03591626510024071
Validation loss = 0.026379186660051346
Validation loss = 0.030186273157596588
Validation loss = 0.02949259988963604
Validation loss = 0.03892464563250542
Validation loss = 0.032353371381759644
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.21267379820346832
Validation loss = 0.09660577774047852
Validation loss = 0.07322480529546738
Validation loss = 0.05980242043733597
Validation loss = 0.062393028289079666
Validation loss = 0.053252529352903366
Validation loss = 0.0509849414229393
Validation loss = 0.049916934221982956
Validation loss = 0.0408865287899971
Validation loss = 0.0606846883893013
Validation loss = 0.03792426362633705
Validation loss = 0.043398551642894745
Validation loss = 0.03606228530406952
Validation loss = 0.045052893459796906
Validation loss = 0.04456336051225662
Validation loss = 0.030834853649139404
Validation loss = 0.034503404051065445
Validation loss = 0.02870343253016472
Validation loss = 0.031414102762937546
Validation loss = 0.026745649054646492
Validation loss = 0.026390478014945984
Validation loss = 0.027895569801330566
Validation loss = 0.026783861219882965
Validation loss = 0.026203613728284836
Validation loss = 0.028141723945736885
Validation loss = 0.032915204763412476
Validation loss = 0.02230016700923443
Validation loss = 0.025930050760507584
Validation loss = 0.022259265184402466
Validation loss = 0.026939453557133675
Validation loss = 0.020171204581856728
Validation loss = 0.04396433383226395
Validation loss = 0.029092520475387573
Validation loss = 0.024602457880973816
Validation loss = 0.03158324956893921
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.9    |
| Iteration     | 2        |
| MaximumReturn | -0.0901  |
| MinimumReturn | -78.9    |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12125923484563828
Validation loss = 0.06317944824695587
Validation loss = 0.06057356670498848
Validation loss = 0.04681358113884926
Validation loss = 0.04163109138607979
Validation loss = 0.04800531640648842
Validation loss = 0.035454340279102325
Validation loss = 0.037086810916662216
Validation loss = 0.028475822880864143
Validation loss = 0.025937682017683983
Validation loss = 0.03136211261153221
Validation loss = 0.029510468244552612
Validation loss = 0.032323047518730164
Validation loss = 0.027099216356873512
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15641207993030548
Validation loss = 0.0684312954545021
Validation loss = 0.049243565648794174
Validation loss = 0.039314012974500656
Validation loss = 0.032213013619184494
Validation loss = 0.03147686645388603
Validation loss = 0.02619616687297821
Validation loss = 0.021880069747567177
Validation loss = 0.0239727646112442
Validation loss = 0.02027512900531292
Validation loss = 0.01831393502652645
Validation loss = 0.0161428265273571
Validation loss = 0.017273742705583572
Validation loss = 0.037206728011369705
Validation loss = 0.02636490762233734
Validation loss = 0.021388821303844452
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1320401281118393
Validation loss = 0.06624387204647064
Validation loss = 0.051724206656217575
Validation loss = 0.03405680134892464
Validation loss = 0.02449905313551426
Validation loss = 0.02390686236321926
Validation loss = 0.027018403634428978
Validation loss = 0.018641455098986626
Validation loss = 0.01755574159324169
Validation loss = 0.016901515424251556
Validation loss = 0.019863978028297424
Validation loss = 0.03657817095518112
Validation loss = 0.017181197181344032
Validation loss = 0.019441647455096245
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12529027462005615
Validation loss = 0.0559409074485302
Validation loss = 0.041333477944135666
Validation loss = 0.03207534924149513
Validation loss = 0.027945267036557198
Validation loss = 0.022626223042607307
Validation loss = 0.02799852192401886
Validation loss = 0.03036336600780487
Validation loss = 0.024523690342903137
Validation loss = 0.018744759261608124
Validation loss = 0.020783334970474243
Validation loss = 0.022239940240979195
Validation loss = 0.018862886354327202
Validation loss = 0.019752243533730507
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11020470410585403
Validation loss = 0.043344009667634964
Validation loss = 0.03337666764855385
Validation loss = 0.02758004702627659
Validation loss = 0.0224342942237854
Validation loss = 0.02978731505572796
Validation loss = 0.021212657913565636
Validation loss = 0.01711750589311123
Validation loss = 0.021770039573311806
Validation loss = 0.022413192316889763
Validation loss = 0.01862308196723461
Validation loss = 0.020901896059513092
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.097   |
| Iteration     | 3        |
| MaximumReturn | -0.0579  |
| MinimumReturn | -0.13    |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10000883787870407
Validation loss = 0.047318097203969955
Validation loss = 0.03702379763126373
Validation loss = 0.02860683761537075
Validation loss = 0.023598771542310715
Validation loss = 0.020830493420362473
Validation loss = 0.02049526572227478
Validation loss = 0.034949928522109985
Validation loss = 0.025689734145998955
Validation loss = 0.0200085137039423
Validation loss = 0.019323648884892464
Validation loss = 0.019234556704759598
Validation loss = 0.024130092933773994
Validation loss = 0.01774473488330841
Validation loss = 0.015659460797905922
Validation loss = 0.01662810891866684
Validation loss = 0.019147660583257675
Validation loss = 0.020125478506088257
Validation loss = 0.016060156747698784
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10644631087779999
Validation loss = 0.044164761900901794
Validation loss = 0.02447446808218956
Validation loss = 0.029445435851812363
Validation loss = 0.019587775692343712
Validation loss = 0.023695586249232292
Validation loss = 0.02359383925795555
Validation loss = 0.013787181116640568
Validation loss = 0.02178969234228134
Validation loss = 0.021866170689463615
Validation loss = 0.013459439389407635
Validation loss = 0.014216546900570393
Validation loss = 0.016542065888643265
Validation loss = 0.012361551634967327
Validation loss = 0.01632307842373848
Validation loss = 0.011380985379219055
Validation loss = 0.01367124542593956
Validation loss = 0.013055818155407906
Validation loss = 0.024700937792658806
Validation loss = 0.011797944083809853
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07700072228908539
Validation loss = 0.027441944926977158
Validation loss = 0.0207308828830719
Validation loss = 0.019240686669945717
Validation loss = 0.023070989176630974
Validation loss = 0.017220260575413704
Validation loss = 0.016889503225684166
Validation loss = 0.015904057770967484
Validation loss = 0.014435443095862865
Validation loss = 0.015447994694113731
Validation loss = 0.012912511825561523
Validation loss = 0.017863504588603973
Validation loss = 0.011541040614247322
Validation loss = 0.01609799824655056
Validation loss = 0.01593935303390026
Validation loss = 0.014154396951198578
Validation loss = 0.01135784201323986
Validation loss = 0.012346557341516018
Validation loss = 0.01324523612856865
Validation loss = 0.010175987146794796
Validation loss = 0.0131462924182415
Validation loss = 0.012552522122859955
Validation loss = 0.014812195673584938
Validation loss = 0.0093916617333889
Validation loss = 0.01077163964509964
Validation loss = 0.013885103166103363
Validation loss = 0.017789555713534355
Validation loss = 0.015440914779901505
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08078757673501968
Validation loss = 0.05346962437033653
Validation loss = 0.0388907827436924
Validation loss = 0.03170187026262283
Validation loss = 0.022782105952501297
Validation loss = 0.022708695381879807
Validation loss = 0.019068649038672447
Validation loss = 0.018445128574967384
Validation loss = 0.01589573547244072
Validation loss = 0.016566094011068344
Validation loss = 0.012755346484482288
Validation loss = 0.01325424388051033
Validation loss = 0.01632416620850563
Validation loss = 0.019298013299703598
Validation loss = 0.012247905135154724
Validation loss = 0.011486856266856194
Validation loss = 0.011432986706495285
Validation loss = 0.013142560608685017
Validation loss = 0.013589907437562943
Validation loss = 0.012807427905499935
Validation loss = 0.02202795073390007
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08919671177864075
Validation loss = 0.02902592532336712
Validation loss = 0.020591270178556442
Validation loss = 0.018523992970585823
Validation loss = 0.02010677196085453
Validation loss = 0.020191391929984093
Validation loss = 0.01689961552619934
Validation loss = 0.017115864902734756
Validation loss = 0.014218596741557121
Validation loss = 0.011938277631998062
Validation loss = 0.014410342089831829
Validation loss = 0.012051725760102272
Validation loss = 0.01832881197333336
Validation loss = 0.011159432120621204
Validation loss = 0.018198881298303604
Validation loss = 0.011370542459189892
Validation loss = 0.011831009760499
Validation loss = 0.01621280610561371
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00343 |
| Iteration     | 4        |
| MaximumReturn | -0.00239 |
| MinimumReturn | -0.00425 |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05558478832244873
Validation loss = 0.02069375291466713
Validation loss = 0.014596750028431416
Validation loss = 0.01843791827559471
Validation loss = 0.010881711728870869
Validation loss = 0.010528948158025742
Validation loss = 0.021523205563426018
Validation loss = 0.017625991255044937
Validation loss = 0.012202407233417034
Validation loss = 0.009145676158368587
Validation loss = 0.011285470798611641
Validation loss = 0.010607277043163776
Validation loss = 0.010199901647865772
Validation loss = 0.0116288335993886
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04559556022286415
Validation loss = 0.017467346042394638
Validation loss = 0.01208923663944006
Validation loss = 0.009634016081690788
Validation loss = 0.011192405596375465
Validation loss = 0.01030375063419342
Validation loss = 0.009565271437168121
Validation loss = 0.008673980832099915
Validation loss = 0.012662893161177635
Validation loss = 0.010294894687831402
Validation loss = 0.009659012779593468
Validation loss = 0.008693201467394829
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04430318623781204
Validation loss = 0.014218384400010109
Validation loss = 0.011873516254127026
Validation loss = 0.00951515231281519
Validation loss = 0.011516481637954712
Validation loss = 0.010415682569146156
Validation loss = 0.01228289119899273
Validation loss = 0.012489212676882744
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026541123166680336
Validation loss = 0.016088096424937248
Validation loss = 0.010409414768218994
Validation loss = 0.009215179830789566
Validation loss = 0.01679619774222374
Validation loss = 0.01303709577769041
Validation loss = 0.009375355206429958
Validation loss = 0.017283381894230843
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05475032329559326
Validation loss = 0.014986691996455193
Validation loss = 0.01177933532744646
Validation loss = 0.012886377051472664
Validation loss = 0.011021926067769527
Validation loss = 0.008503248915076256
Validation loss = 0.0074835666455328465
Validation loss = 0.008108213543891907
Validation loss = 0.015282340347766876
Validation loss = 0.012556147761642933
Validation loss = 0.012023484334349632
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00509 |
| Iteration     | 5        |
| MaximumReturn | -0.00385 |
| MinimumReturn | -0.00681 |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03858950361609459
Validation loss = 0.01371146272867918
Validation loss = 0.010863328352570534
Validation loss = 0.009620541706681252
Validation loss = 0.014480682089924812
Validation loss = 0.008453491143882275
Validation loss = 0.017745526507496834
Validation loss = 0.011411826126277447
Validation loss = 0.009828169830143452
Validation loss = 0.013526195660233498
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.029623273760080338
Validation loss = 0.00991708505898714
Validation loss = 0.010671484284102917
Validation loss = 0.008197121322154999
Validation loss = 0.008961985819041729
Validation loss = 0.03842109441757202
Validation loss = 0.01305589359253645
Validation loss = 0.012436270713806152
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.026240337640047073
Validation loss = 0.010380753315985203
Validation loss = 0.010381577536463737
Validation loss = 0.00694784801453352
Validation loss = 0.008692843839526176
Validation loss = 0.0074765742756426334
Validation loss = 0.008287697099149227
Validation loss = 0.011962583288550377
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.031410034745931625
Validation loss = 0.014787939377129078
Validation loss = 0.010768086649477482
Validation loss = 0.016494298353791237
Validation loss = 0.009978989139199257
Validation loss = 0.007849862799048424
Validation loss = 0.011629745364189148
Validation loss = 0.019679633900523186
Validation loss = 0.009258502162992954
Validation loss = 0.035070810467004776
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024994712322950363
Validation loss = 0.01103167049586773
Validation loss = 0.012890378944575787
Validation loss = 0.011019738391041756
Validation loss = 0.00975867360830307
Validation loss = 0.009529820643365383
Validation loss = 0.008722487837076187
Validation loss = 0.009019123390316963
Validation loss = 0.011125457473099232
Validation loss = 0.00933412741869688
Validation loss = 0.014478079974651337
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.061   |
| Iteration     | 6        |
| MaximumReturn | -0.0409  |
| MinimumReturn | -0.0813  |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024560457095503807
Validation loss = 0.007358314003795385
Validation loss = 0.005004063248634338
Validation loss = 0.006348444614559412
Validation loss = 0.005412600468844175
Validation loss = 0.01613566093146801
Validation loss = 0.006476439069956541
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.021022342145442963
Validation loss = 0.005829887464642525
Validation loss = 0.005970314610749483
Validation loss = 0.005312381312251091
Validation loss = 0.005211812909692526
Validation loss = 0.006063658278435469
Validation loss = 0.004616865422576666
Validation loss = 0.004847271833568811
Validation loss = 0.006699832156300545
Validation loss = 0.005071487743407488
Validation loss = 0.005676353350281715
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027530213817954063
Validation loss = 0.006449110805988312
Validation loss = 0.0047280932776629925
Validation loss = 0.005939783062785864
Validation loss = 0.005114816594868898
Validation loss = 0.0058411951176822186
Validation loss = 0.004262885544449091
Validation loss = 0.005776403471827507
Validation loss = 0.005338653456419706
Validation loss = 0.004630249459296465
Validation loss = 0.00535653717815876
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023252204060554504
Validation loss = 0.006545966025441885
Validation loss = 0.005413141567260027
Validation loss = 0.006494852248579264
Validation loss = 0.00560607248917222
Validation loss = 0.00842292420566082
Validation loss = 0.0045686946250498295
Validation loss = 0.006054828409105539
Validation loss = 0.004692017566412687
Validation loss = 0.008274395950138569
Validation loss = 0.00679771276190877
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01567782275378704
Validation loss = 0.0052147782407701015
Validation loss = 0.005446671973913908
Validation loss = 0.005005231127142906
Validation loss = 0.005662400741130114
Validation loss = 0.004306241404265165
Validation loss = 0.009688391350209713
Validation loss = 0.008716474287211895
Validation loss = 0.004787352867424488
Validation loss = 0.005565965548157692
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00451 |
| Iteration     | 7        |
| MaximumReturn | -0.0032  |
| MinimumReturn | -0.00635 |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010041582398116589
Validation loss = 0.00682796398177743
Validation loss = 0.006697372067719698
Validation loss = 0.007311479654163122
Validation loss = 0.006738135125488043
Validation loss = 0.009822539053857327
Validation loss = 0.005965446587651968
Validation loss = 0.00720061082392931
Validation loss = 0.005286409053951502
Validation loss = 0.006689481437206268
Validation loss = 0.008482214994728565
Validation loss = 0.006422518286854029
Validation loss = 0.005326726008206606
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016215695068240166
Validation loss = 0.006259693298488855
Validation loss = 0.007830913178622723
Validation loss = 0.005961164832115173
Validation loss = 0.008038676343858242
Validation loss = 0.00464232312515378
Validation loss = 0.0043815611861646175
Validation loss = 0.006577500607818365
Validation loss = 0.004861983936280012
Validation loss = 0.006468675564974546
Validation loss = 0.005228482652455568
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009707331657409668
Validation loss = 0.005365069955587387
Validation loss = 0.006291263736784458
Validation loss = 0.005700030829757452
Validation loss = 0.008784306235611439
Validation loss = 0.006290403660386801
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009749369695782661
Validation loss = 0.007508744485676289
Validation loss = 0.008840051479637623
Validation loss = 0.004810654558241367
Validation loss = 0.005169613752514124
Validation loss = 0.005625335965305567
Validation loss = 0.0038458507042378187
Validation loss = 0.00833401270210743
Validation loss = 0.004758641589432955
Validation loss = 0.005323335062712431
Validation loss = 0.005534724798053503
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011364550329744816
Validation loss = 0.006223364733159542
Validation loss = 0.007194265257567167
Validation loss = 0.005486589856445789
Validation loss = 0.005390754900872707
Validation loss = 0.00877691712230444
Validation loss = 0.0073328325524926186
Validation loss = 0.005725377704948187
Validation loss = 0.005187018308788538
Validation loss = 0.004189590457826853
Validation loss = 0.005335152614861727
Validation loss = 0.007083037402480841
Validation loss = 0.005812595132738352
Validation loss = 0.005637084133923054
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000966 |
| Iteration     | 8         |
| MaximumReturn | -0.000623 |
| MinimumReturn | -0.00176  |
| TotalSamples  | 16660     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0218163151293993
Validation loss = 0.006972555536776781
Validation loss = 0.00588202616199851
Validation loss = 0.005308541003614664
Validation loss = 0.006133529357612133
Validation loss = 0.00565750990062952
Validation loss = 0.005959866102784872
Validation loss = 0.005353654734790325
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0055992379784584045
Validation loss = 0.0065345363691449165
Validation loss = 0.009210610762238503
Validation loss = 0.007285428233444691
Validation loss = 0.005151813384145498
Validation loss = 0.003905266523361206
Validation loss = 0.0058343540877103806
Validation loss = 0.004890007432550192
Validation loss = 0.006262578070163727
Validation loss = 0.004172049928456545
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00950460322201252
Validation loss = 0.005409420933574438
Validation loss = 0.007061639800667763
Validation loss = 0.0058328378945589066
Validation loss = 0.00939980335533619
Validation loss = 0.00787863414734602
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024172481149435043
Validation loss = 0.0069558219984173775
Validation loss = 0.005847676191478968
Validation loss = 0.004757859744131565
Validation loss = 0.005181960761547089
Validation loss = 0.0044840131886303425
Validation loss = 0.006681620609015226
Validation loss = 0.007060571573674679
Validation loss = 0.006923516746610403
Validation loss = 0.004760075360536575
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009400542825460434
Validation loss = 0.0047319079749286175
Validation loss = 0.006293994840234518
Validation loss = 0.004983570426702499
Validation loss = 0.008175786584615707
Validation loss = 0.006544396281242371
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0256  |
| Iteration     | 9        |
| MaximumReturn | -0.0224  |
| MinimumReturn | -0.0316  |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006700714118778706
Validation loss = 0.004729567561298609
Validation loss = 0.005156556144356728
Validation loss = 0.005031304899603128
Validation loss = 0.003953746519982815
Validation loss = 0.0033472259528934956
Validation loss = 0.005254402756690979
Validation loss = 0.006819750182330608
Validation loss = 0.008200372569262981
Validation loss = 0.005147238727658987
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006089964881539345
Validation loss = 0.007835354655981064
Validation loss = 0.007654642686247826
Validation loss = 0.005599976982921362
Validation loss = 0.006606279872357845
Validation loss = 0.004296607803553343
Validation loss = 0.021409772336483
Validation loss = 0.01043995562940836
Validation loss = 0.006418345961719751
Validation loss = 0.0037923704367130995
Validation loss = 0.003755733836442232
Validation loss = 0.004786074161529541
Validation loss = 0.0038154684007167816
Validation loss = 0.004378750454634428
Validation loss = 0.008876778185367584
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011838112026453018
Validation loss = 0.0039031642954796553
Validation loss = 0.004129262175410986
Validation loss = 0.003494509030133486
Validation loss = 0.002974829636514187
Validation loss = 0.0037384931929409504
Validation loss = 0.0052278004586696625
Validation loss = 0.010407960042357445
Validation loss = 0.004223216790705919
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007185656111687422
Validation loss = 0.006007374729961157
Validation loss = 0.004024033434689045
Validation loss = 0.0045378077775239944
Validation loss = 0.0058083669282495975
Validation loss = 0.004521601367741823
Validation loss = 0.0070779952220618725
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007839906960725784
Validation loss = 0.004567621275782585
Validation loss = 0.005309008061885834
Validation loss = 0.005895477719604969
Validation loss = 0.0035230997018516064
Validation loss = 0.0033293133601546288
Validation loss = 0.004971405491232872
Validation loss = 0.005946945399045944
Validation loss = 0.003036661073565483
Validation loss = 0.003533085575327277
Validation loss = 0.0070460885763168335
Validation loss = 0.006031556986272335
Validation loss = 0.0048023611307144165
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000981 |
| Iteration     | 10        |
| MaximumReturn | -0.000705 |
| MinimumReturn | -0.00153  |
| TotalSamples  | 19992     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00584984477609396
Validation loss = 0.0048731667920947075
Validation loss = 0.004554004408419132
Validation loss = 0.007770502008497715
Validation loss = 0.00940353237092495
Validation loss = 0.004388193599879742
Validation loss = 0.0036934777162969112
Validation loss = 0.0039046364836394787
Validation loss = 0.006100417114794254
Validation loss = 0.004106239881366491
Validation loss = 0.0036893312353640795
Validation loss = 0.004780422430485487
Validation loss = 0.0038467056583613157
Validation loss = 0.005806351080536842
Validation loss = 0.007403402589261532
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009440255351364613
Validation loss = 0.012295525521039963
Validation loss = 0.005384599324315786
Validation loss = 0.005117453169077635
Validation loss = 0.004062469583004713
Validation loss = 0.0039207362569868565
Validation loss = 0.006701062433421612
Validation loss = 0.004579820670187473
Validation loss = 0.0031745347660034895
Validation loss = 0.0040001948364079
Validation loss = 0.003715296508744359
Validation loss = 0.004508234094828367
Validation loss = 0.011251403018832207
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0075529711320996284
Validation loss = 0.005155845545232296
Validation loss = 0.005274423398077488
Validation loss = 0.004277578089386225
Validation loss = 0.006674672476947308
Validation loss = 0.00819208100438118
Validation loss = 0.004714908543974161
Validation loss = 0.010035349056124687
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006491514854133129
Validation loss = 0.004017268307507038
Validation loss = 0.006533486302942038
Validation loss = 0.003657221095636487
Validation loss = 0.007517081685364246
Validation loss = 0.008634185418486595
Validation loss = 0.006885707378387451
Validation loss = 0.005445756018161774
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0072799078188836575
Validation loss = 0.003361657727509737
Validation loss = 0.003737273160368204
Validation loss = 0.006743824575096369
Validation loss = 0.006880223751068115
Validation loss = 0.011929379776120186
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00199  |
| Iteration     | 11        |
| MaximumReturn | -0.000567 |
| MinimumReturn | -0.00603  |
| TotalSamples  | 21658     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0069396221078932285
Validation loss = 0.007025784347206354
Validation loss = 0.005138128064572811
Validation loss = 0.0033958242274820805
Validation loss = 0.0038893234450370073
Validation loss = 0.002891332609578967
Validation loss = 0.007415821310132742
Validation loss = 0.005782651714980602
Validation loss = 0.006096892524510622
Validation loss = 0.0067597487941384315
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00730142742395401
Validation loss = 0.004922471474856138
Validation loss = 0.0034527857787907124
Validation loss = 0.004080305807292461
Validation loss = 0.0030095104593783617
Validation loss = 0.004569351673126221
Validation loss = 0.002850201679393649
Validation loss = 0.0097477613016963
Validation loss = 0.003716814797371626
Validation loss = 0.006284299306571484
Validation loss = 0.003880384610965848
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010727571323513985
Validation loss = 0.004837158601731062
Validation loss = 0.0037747782189399004
Validation loss = 0.005029965657740831
Validation loss = 0.0035256645642220974
Validation loss = 0.004790394101291895
Validation loss = 0.0059392075054347515
Validation loss = 0.00410449830815196
Validation loss = 0.005689666606485844
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01034852396696806
Validation loss = 0.007386809680610895
Validation loss = 0.004695393145084381
Validation loss = 0.005194948986172676
Validation loss = 0.006292541511356831
Validation loss = 0.0030785747803747654
Validation loss = 0.0057088653557002544
Validation loss = 0.005867459811270237
Validation loss = 0.0038024112582206726
Validation loss = 0.0051217228174209595
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00685280654579401
Validation loss = 0.004765866789966822
Validation loss = 0.005571125540882349
Validation loss = 0.0043951114639639854
Validation loss = 0.0032402232754975557
Validation loss = 0.0038406469393521547
Validation loss = 0.003899420378729701
Validation loss = 0.004895280115306377
Validation loss = 0.004510968923568726
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00208  |
| Iteration     | 12        |
| MaximumReturn | -0.000534 |
| MinimumReturn | -0.00677  |
| TotalSamples  | 23324     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008022810332477093
Validation loss = 0.004613569006323814
Validation loss = 0.0024706870317459106
Validation loss = 0.004654184449464083
Validation loss = 0.0026941061951220036
Validation loss = 0.004213111475110054
Validation loss = 0.0036530971992760897
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005546277854591608
Validation loss = 0.003610609332099557
Validation loss = 0.0029223114252090454
Validation loss = 0.005500452127307653
Validation loss = 0.00309020490385592
Validation loss = 0.003175585065037012
Validation loss = 0.0024064891040325165
Validation loss = 0.006663916632533073
Validation loss = 0.008238700218498707
Validation loss = 0.003279700642451644
Validation loss = 0.005268936511129141
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006969114765524864
Validation loss = 0.007578599266707897
Validation loss = 0.005424004513770342
Validation loss = 0.0032903493847697973
Validation loss = 0.0034311339259147644
Validation loss = 0.002952305134385824
Validation loss = 0.0046278792433440685
Validation loss = 0.003202244406566024
Validation loss = 0.007239685859531164
Validation loss = 0.003956282511353493
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00565669359639287
Validation loss = 0.0030072450172156096
Validation loss = 0.0036322406958788633
Validation loss = 0.002812270075082779
Validation loss = 0.008469323627650738
Validation loss = 0.0032159907277673483
Validation loss = 0.005505500826984644
Validation loss = 0.004128383472561836
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006277340464293957
Validation loss = 0.004400676116347313
Validation loss = 0.005154608283191919
Validation loss = 0.0034296915400773287
Validation loss = 0.004394278395920992
Validation loss = 0.004300631117075682
Validation loss = 0.006322460249066353
Validation loss = 0.004665515851229429
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00228  |
| Iteration     | 13        |
| MaximumReturn | -0.000585 |
| MinimumReturn | -0.011    |
| TotalSamples  | 24990     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006897889077663422
Validation loss = 0.003742025000974536
Validation loss = 0.0035533884074538946
Validation loss = 0.0027114932890981436
Validation loss = 0.004594664555042982
Validation loss = 0.006040196865797043
Validation loss = 0.003787735477089882
Validation loss = 0.0029976724181324244
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006150553468614817
Validation loss = 0.00472549581900239
Validation loss = 0.005584361497312784
Validation loss = 0.004039924591779709
Validation loss = 0.003739479696378112
Validation loss = 0.006593930069357157
Validation loss = 0.002822840353474021
Validation loss = 0.0024879572447389364
Validation loss = 0.0030538467690348625
Validation loss = 0.0027752697933465242
Validation loss = 0.0031781860161572695
Validation loss = 0.00285149528644979
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009167298674583435
Validation loss = 0.0045798830687999725
Validation loss = 0.005148135591298342
Validation loss = 0.004787779413163662
Validation loss = 0.0028134817257523537
Validation loss = 0.0029358789324760437
Validation loss = 0.005377767141908407
Validation loss = 0.0028026553336530924
Validation loss = 0.00574655132368207
Validation loss = 0.002287532202899456
Validation loss = 0.002095694188028574
Validation loss = 0.004315480124205351
Validation loss = 0.0038504370022565126
Validation loss = 0.003629397600889206
Validation loss = 0.004543304909020662
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0035845416132360697
Validation loss = 0.002260396257042885
Validation loss = 0.0026297832373529673
Validation loss = 0.004191340412944555
Validation loss = 0.0028423855546861887
Validation loss = 0.003904328914359212
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004659784026443958
Validation loss = 0.003932760562747717
Validation loss = 0.004521465860307217
Validation loss = 0.002654280746355653
Validation loss = 0.0057527762837708
Validation loss = 0.0025201085954904556
Validation loss = 0.0031194661278277636
Validation loss = 0.0034263331908732653
Validation loss = 0.0032347978558391333
Validation loss = 0.003650018246844411
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00159  |
| Iteration     | 14        |
| MaximumReturn | -0.000561 |
| MinimumReturn | -0.00881  |
| TotalSamples  | 26656     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0025397739373147488
Validation loss = 0.005046066828072071
Validation loss = 0.0026528583839535713
Validation loss = 0.0033588442020118237
Validation loss = 0.0034894635900855064
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004082116764038801
Validation loss = 0.005507135763764381
Validation loss = 0.003983557689934969
Validation loss = 0.004864143207669258
Validation loss = 0.0031087209936231375
Validation loss = 0.0020377261098474264
Validation loss = 0.002519170055165887
Validation loss = 0.002752572763711214
Validation loss = 0.0036521670408546925
Validation loss = 0.004130632150918245
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027364580892026424
Validation loss = 0.003622104413807392
Validation loss = 0.004364231135696173
Validation loss = 0.007223647553473711
Validation loss = 0.006705933250486851
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004274152684956789
Validation loss = 0.0025460137985646725
Validation loss = 0.004401891026645899
Validation loss = 0.00289356242865324
Validation loss = 0.005726787727326155
Validation loss = 0.006617755629122257
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003917807713150978
Validation loss = 0.00647767586633563
Validation loss = 0.003239575307816267
Validation loss = 0.0036102628801018
Validation loss = 0.003452596254646778
Validation loss = 0.004692817106842995
Validation loss = 0.007296931464225054
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00153  |
| Iteration     | 15        |
| MaximumReturn | -0.000621 |
| MinimumReturn | -0.00421  |
| TotalSamples  | 28322     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006919334176927805
Validation loss = 0.0031383868772536516
Validation loss = 0.0035554899368435144
Validation loss = 0.0039006019942462444
Validation loss = 0.005507905501872301
Validation loss = 0.002641808008775115
Validation loss = 0.0036342984531074762
Validation loss = 0.0015084471087902784
Validation loss = 0.002695828676223755
Validation loss = 0.004844033624976873
Validation loss = 0.0031521895434707403
Validation loss = 0.00229722005315125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00362517312169075
Validation loss = 0.002454305300489068
Validation loss = 0.002151854569092393
Validation loss = 0.004006353206932545
Validation loss = 0.0018956292187795043
Validation loss = 0.004113210830837488
Validation loss = 0.0023261099122464657
Validation loss = 0.002729683881625533
Validation loss = 0.002277723280712962
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003724265145137906
Validation loss = 0.005477368365973234
Validation loss = 0.0031763932202011347
Validation loss = 0.00248473952524364
Validation loss = 0.005243800114840269
Validation loss = 0.0027632967103272676
Validation loss = 0.0026859675999730825
Validation loss = 0.0039987461641430855
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006408035289496183
Validation loss = 0.0029316511936485767
Validation loss = 0.00218881294131279
Validation loss = 0.0036502096336334944
Validation loss = 0.005111819598823786
Validation loss = 0.002030148170888424
Validation loss = 0.0104740085080266
Validation loss = 0.0067184376530349255
Validation loss = 0.0031319523695856333
Validation loss = 0.003154669189825654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005414518062025309
Validation loss = 0.0027768134605139494
Validation loss = 0.0028708577156066895
Validation loss = 0.002167150843888521
Validation loss = 0.0041852714493870735
Validation loss = 0.002485720906406641
Validation loss = 0.0027934741228818893
Validation loss = 0.002686734078451991
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00127  |
| Iteration     | 16        |
| MaximumReturn | -0.000599 |
| MinimumReturn | -0.00501  |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0027382555417716503
Validation loss = 0.003151568816974759
Validation loss = 0.006954072043299675
Validation loss = 0.0022796730045229197
Validation loss = 0.002909561386331916
Validation loss = 0.002409149194136262
Validation loss = 0.0018477905541658401
Validation loss = 0.003581010503694415
Validation loss = 0.0024230526760220528
Validation loss = 0.0026148229371756315
Validation loss = 0.0033550800289958715
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0032522953115403652
Validation loss = 0.006353552453219891
Validation loss = 0.002749736187979579
Validation loss = 0.0037024137564003468
Validation loss = 0.003057086141780019
Validation loss = 0.0024878631811589003
Validation loss = 0.0023105433210730553
Validation loss = 0.0027491110377013683
Validation loss = 0.008997318334877491
Validation loss = 0.00182731740642339
Validation loss = 0.001997079001739621
Validation loss = 0.0025647433940321207
Validation loss = 0.0033463214058429003
Validation loss = 0.002031176583841443
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0030320442747324705
Validation loss = 0.002861535642296076
Validation loss = 0.003911888226866722
Validation loss = 0.001901298644952476
Validation loss = 0.002489273203536868
Validation loss = 0.005257827695459127
Validation loss = 0.002621168503537774
Validation loss = 0.0022124287206679583
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005845611449331045
Validation loss = 0.0029804043006151915
Validation loss = 0.0035931894090026617
Validation loss = 0.0028649461455643177
Validation loss = 0.00566641241312027
Validation loss = 0.003773117670789361
Validation loss = 0.0028740151319652796
Validation loss = 0.003439395222812891
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003977906424552202
Validation loss = 0.0021706768311560154
Validation loss = 0.0029769763350486755
Validation loss = 0.0035877369809895754
Validation loss = 0.0028259018436074257
Validation loss = 0.0017020292580127716
Validation loss = 0.0038251604419201612
Validation loss = 0.0029438661877065897
Validation loss = 0.003437585197389126
Validation loss = 0.004048404283821583
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00157  |
| Iteration     | 17        |
| MaximumReturn | -0.000566 |
| MinimumReturn | -0.00891  |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004350351635366678
Validation loss = 0.002683149417862296
Validation loss = 0.002737601986154914
Validation loss = 0.0023891800083220005
Validation loss = 0.0061678895726799965
Validation loss = 0.00284401117824018
Validation loss = 0.0025766445323824883
Validation loss = 0.0026793202850967646
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004207724239677191
Validation loss = 0.002146165119484067
Validation loss = 0.0027259422931820154
Validation loss = 0.0030688466504216194
Validation loss = 0.002805590396746993
Validation loss = 0.003122567432001233
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00524572329595685
Validation loss = 0.003389697754755616
Validation loss = 0.0045951660722494125
Validation loss = 0.004279610700905323
Validation loss = 0.0019194410415366292
Validation loss = 0.0020733007695525885
Validation loss = 0.0026741239707916975
Validation loss = 0.004829766228795052
Validation loss = 0.0035771660041064024
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0032632399816066027
Validation loss = 0.002297283848747611
Validation loss = 0.004939505830407143
Validation loss = 0.001959008863195777
Validation loss = 0.0025721073616296053
Validation loss = 0.0029786282684653997
Validation loss = 0.0019472414860501885
Validation loss = 0.0026588751934468746
Validation loss = 0.0029710703529417515
Validation loss = 0.0022907713428139687
Validation loss = 0.004715936258435249
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005324590485543013
Validation loss = 0.0019377541029825807
Validation loss = 0.004777758382260799
Validation loss = 0.003151596523821354
Validation loss = 0.006684594321995974
Validation loss = 0.0026665751356631517
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00127  |
| Iteration     | 18        |
| MaximumReturn | -0.000508 |
| MinimumReturn | -0.00857  |
| TotalSamples  | 33320     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003201342187821865
Validation loss = 0.003593633882701397
Validation loss = 0.0021356616634875536
Validation loss = 0.002175599569454789
Validation loss = 0.0030395968351513147
Validation loss = 0.002586874645203352
Validation loss = 0.003581203520298004
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0020830808207392693
Validation loss = 0.005289056338369846
Validation loss = 0.004296476021409035
Validation loss = 0.0025774280074983835
Validation loss = 0.001718826242722571
Validation loss = 0.0023473408073186874
Validation loss = 0.0041723730973899364
Validation loss = 0.0025310050696134567
Validation loss = 0.0021685687825083733
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002308622235432267
Validation loss = 0.0021081154700368643
Validation loss = 0.006347945425659418
Validation loss = 0.0064464351162314415
Validation loss = 0.003092100378125906
Validation loss = 0.004269186872988939
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004315150901675224
Validation loss = 0.0024827371817082167
Validation loss = 0.002955676754936576
Validation loss = 0.00401731114834547
Validation loss = 0.004211715422570705
Validation loss = 0.002556034130975604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00401001051068306
Validation loss = 0.0033074095845222473
Validation loss = 0.0038885269314050674
Validation loss = 0.0029544939752668142
Validation loss = 0.004497057292610407
Validation loss = 0.0021336835343390703
Validation loss = 0.003013337729498744
Validation loss = 0.004510124213993549
Validation loss = 0.0023062718100845814
Validation loss = 0.0034476076252758503
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00253  |
| Iteration     | 19        |
| MaximumReturn | -0.000603 |
| MinimumReturn | -0.00978  |
| TotalSamples  | 34986     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002954915864393115
Validation loss = 0.0032286259811371565
Validation loss = 0.0025485556107014418
Validation loss = 0.001630433020181954
Validation loss = 0.002583240158855915
Validation loss = 0.0022754331585019827
Validation loss = 0.003501910250633955
Validation loss = 0.003970633260905743
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0038730392698198557
Validation loss = 0.004319954197853804
Validation loss = 0.002139398595318198
Validation loss = 0.0018815569346770644
Validation loss = 0.0023674224503338337
Validation loss = 0.001867364626377821
Validation loss = 0.0019448198145255446
Validation loss = 0.002096627140417695
Validation loss = 0.0028256478253751993
Validation loss = 0.0027433508075773716
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027333521284163
Validation loss = 0.002719087526202202
Validation loss = 0.00224323314614594
Validation loss = 0.0017667385982349515
Validation loss = 0.003166738199070096
Validation loss = 0.005445057991892099
Validation loss = 0.006093370262533426
Validation loss = 0.0030181959737092257
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0039079259149730206
Validation loss = 0.002501383889466524
Validation loss = 0.003121798625215888
Validation loss = 0.0035407855175435543
Validation loss = 0.0021262639202177525
Validation loss = 0.0025278539396822453
Validation loss = 0.0030884272418916225
Validation loss = 0.0036897514946758747
Validation loss = 0.0015529367374256253
Validation loss = 0.0021670295391231775
Validation loss = 0.004908476024866104
Validation loss = 0.002854695776477456
Validation loss = 0.0020173543598502874
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002816425636410713
Validation loss = 0.003072545863687992
Validation loss = 0.0026706475764513016
Validation loss = 0.005356492009013891
Validation loss = 0.004792168270796537
Validation loss = 0.004035299178212881
Validation loss = 0.0017638793215155602
Validation loss = 0.002915096702054143
Validation loss = 0.002311167074367404
Validation loss = 0.002656947821378708
Validation loss = 0.001984023954719305
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.015    |
| Iteration     | 20        |
| MaximumReturn | -0.000633 |
| MinimumReturn | -0.169    |
| TotalSamples  | 36652     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00473432894796133
Validation loss = 0.006663050502538681
Validation loss = 0.0028514456935226917
Validation loss = 0.0025180396623909473
Validation loss = 0.0031794554088264704
Validation loss = 0.002496207831427455
Validation loss = 0.0036406596191227436
Validation loss = 0.002630220027640462
Validation loss = 0.002696567913517356
Validation loss = 0.0020749676041305065
Validation loss = 0.0033021073322743177
Validation loss = 0.0019282698631286621
Validation loss = 0.0020278801675885916
Validation loss = 0.002256267936900258
Validation loss = 0.0035498167853802443
Validation loss = 0.001883347169496119
Validation loss = 0.005078193731606007
Validation loss = 0.0031359316781163216
Validation loss = 0.0026559268590062857
Validation loss = 0.005276883952319622
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006420546676963568
Validation loss = 0.0030391959007829428
Validation loss = 0.0033831228502094746
Validation loss = 0.0021383012644946575
Validation loss = 0.001759626786224544
Validation loss = 0.0017916338983923197
Validation loss = 0.006888026371598244
Validation loss = 0.002449450781568885
Validation loss = 0.002872935263440013
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004148326814174652
Validation loss = 0.0027138388250023127
Validation loss = 0.005924519617110491
Validation loss = 0.003955107647925615
Validation loss = 0.004874723497778177
Validation loss = 0.0029102470725774765
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0035063144750893116
Validation loss = 0.0033403460402041674
Validation loss = 0.004901635926216841
Validation loss = 0.0021155267022550106
Validation loss = 0.005089516285806894
Validation loss = 0.004475772846490145
Validation loss = 0.001768867950886488
Validation loss = 0.0026147617027163506
Validation loss = 0.003532656002789736
Validation loss = 0.0023148651234805584
Validation loss = 0.004648338072001934
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009568914771080017
Validation loss = 0.003115556202828884
Validation loss = 0.004408399574458599
Validation loss = 0.004245935007929802
Validation loss = 0.003514268435537815
Validation loss = 0.0035195339005440474
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0112   |
| Iteration     | 21        |
| MaximumReturn | -0.000602 |
| MinimumReturn | -0.152    |
| TotalSamples  | 38318     |
-----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0034742497373372316
Validation loss = 0.0028485688380897045
Validation loss = 0.002317252568900585
Validation loss = 0.003640545066446066
Validation loss = 0.0030823517590761185
Validation loss = 0.002741368720307946
Validation loss = 0.0030090652871876955
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0021341866813600063
Validation loss = 0.003652636893093586
Validation loss = 0.0030059257987886667
Validation loss = 0.0027462593279778957
Validation loss = 0.004013067111372948
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0035273139365017414
Validation loss = 0.0027376615907996893
Validation loss = 0.0022302563302218914
Validation loss = 0.0033190238755196333
Validation loss = 0.0033726762048900127
Validation loss = 0.002189540071412921
Validation loss = 0.004335509613156319
Validation loss = 0.003127265954390168
Validation loss = 0.005181469023227692
Validation loss = 0.004359263926744461
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00406282301992178
Validation loss = 0.003370570484548807
Validation loss = 0.0042081051506102085
Validation loss = 0.0028840270824730396
Validation loss = 0.001991647994145751
Validation loss = 0.004539370536804199
Validation loss = 0.0022527475375682116
Validation loss = 0.003547990694642067
Validation loss = 0.002972470596432686
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010166469030082226
Validation loss = 0.0036303792148828506
Validation loss = 0.004714916925877333
Validation loss = 0.002966521307826042
Validation loss = 0.0038171897176653147
Validation loss = 0.003047429956495762
Validation loss = 0.002477615373209119
Validation loss = 0.0029751164838671684
Validation loss = 0.004332771059125662
Validation loss = 0.0048322225920856
Validation loss = 0.002843471011146903
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.881    |
| Iteration     | 22        |
| MaximumReturn | -0.000615 |
| MinimumReturn | -21.8     |
| TotalSamples  | 39984     |
-----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005301954224705696
Validation loss = 0.004233368672430515
Validation loss = 0.0033371183089911938
Validation loss = 0.0028468084055930376
Validation loss = 0.0033619378227740526
Validation loss = 0.0032720081508159637
Validation loss = 0.003042774274945259
Validation loss = 0.003119037486612797
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003335457295179367
Validation loss = 0.002596566453576088
Validation loss = 0.004584623966366053
Validation loss = 0.0029707045760005713
Validation loss = 0.0038723803590983152
Validation loss = 0.0026629457715898752
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003067450597882271
Validation loss = 0.006494166795164347
Validation loss = 0.005373022519052029
Validation loss = 0.002359075006097555
Validation loss = 0.0028353955131024122
Validation loss = 0.0031034103594720364
Validation loss = 0.0032837730832397938
Validation loss = 0.003719731466844678
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003969122655689716
Validation loss = 0.0029101893305778503
Validation loss = 0.004358397331088781
Validation loss = 0.0034476821310818195
Validation loss = 0.0042066024616360664
Validation loss = 0.002452037762850523
Validation loss = 0.0026081998366862535
Validation loss = 0.004371126182377338
Validation loss = 0.0023701516911387444
Validation loss = 0.0027152099646627903
Validation loss = 0.005317972507327795
Validation loss = 0.003160560503602028
Validation loss = 0.002249622019007802
Validation loss = 0.002750684507191181
Validation loss = 0.003152407705783844
Validation loss = 0.0032041589729487896
Validation loss = 0.002848394215106964
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004923683125525713
Validation loss = 0.0038947928696870804
Validation loss = 0.0038326620124280453
Validation loss = 0.002999094780534506
Validation loss = 0.003518996061757207
Validation loss = 0.004170550964772701
Validation loss = 0.00456647202372551
Validation loss = 0.0035649421624839306
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00512  |
| Iteration     | 23        |
| MaximumReturn | -0.000581 |
| MinimumReturn | -0.048    |
| TotalSamples  | 41650     |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0035865511745214462
Validation loss = 0.002969962079077959
Validation loss = 0.002763645723462105
Validation loss = 0.0038106925785541534
Validation loss = 0.003100347938016057
Validation loss = 0.0035413838922977448
Validation loss = 0.006055336445569992
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0034711596090346575
Validation loss = 0.003765116911381483
Validation loss = 0.00408176751807332
Validation loss = 0.002933699171990156
Validation loss = 0.0030684873927384615
Validation loss = 0.002302503678947687
Validation loss = 0.0023588058538734913
Validation loss = 0.004561011679470539
Validation loss = 0.005679079331457615
Validation loss = 0.006074868608266115
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002559254178777337
Validation loss = 0.003185670357197523
Validation loss = 0.0027306799311190844
Validation loss = 0.006113898940384388
Validation loss = 0.003145796712487936
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004067054484039545
Validation loss = 0.002899372251704335
Validation loss = 0.003866644576191902
Validation loss = 0.0019142214441671968
Validation loss = 0.0021505942568182945
Validation loss = 0.004888584371656179
Validation loss = 0.004950062371790409
Validation loss = 0.0019392305985093117
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003999194595962763
Validation loss = 0.004961763974279165
Validation loss = 0.00500840786844492
Validation loss = 0.0038745435886085033
Validation loss = 0.004334109835326672
Validation loss = 0.0045572081580758095
Validation loss = 0.004115602467209101
Validation loss = 0.003692945931106806
Validation loss = 0.007162672933191061
Validation loss = 0.003363021183758974
Validation loss = 0.004304955713450909
Validation loss = 0.0044314004480838776
Validation loss = 0.003115274477750063
Validation loss = 0.0025799821596592665
Validation loss = 0.003311749082058668
Validation loss = 0.005506254732608795
Validation loss = 0.003392765996977687
Validation loss = 0.0030103863682597876
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0584   |
| Iteration     | 24        |
| MaximumReturn | -0.000527 |
| MinimumReturn | -1.39     |
| TotalSamples  | 43316     |
-----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.004148636478930712
Validation loss = 0.002761967247352004
Validation loss = 0.005182477645576
Validation loss = 0.003782833693549037
Validation loss = 0.0031743666622787714
Validation loss = 0.0017585956957191229
Validation loss = 0.0030879017431288958
Validation loss = 0.001638913294300437
Validation loss = 0.002356288256123662
Validation loss = 0.002999701304361224
Validation loss = 0.003619682276621461
Validation loss = 0.0019384340848773718
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0038092632312327623
Validation loss = 0.0017047615256160498
Validation loss = 0.0021980602759867907
Validation loss = 0.0031005877535790205
Validation loss = 0.0030899906996637583
Validation loss = 0.0030603569466620684
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0029128941241651773
Validation loss = 0.004473743494600058
Validation loss = 0.0027066555339843035
Validation loss = 0.004463912919163704
Validation loss = 0.0026577424723654985
Validation loss = 0.00260271318256855
Validation loss = 0.0031645791605114937
Validation loss = 0.0031386627815663815
Validation loss = 0.002108405577018857
Validation loss = 0.005152917932718992
Validation loss = 0.0017910627648234367
Validation loss = 0.002056421944871545
Validation loss = 0.0025775583926588297
Validation loss = 0.003201801562681794
Validation loss = 0.0018773908959701657
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00553967384621501
Validation loss = 0.0025675452779978514
Validation loss = 0.00235780724324286
Validation loss = 0.0036488783080130816
Validation loss = 0.005376394838094711
Validation loss = 0.0023199545685201883
Validation loss = 0.001926514320075512
Validation loss = 0.0033743896055966616
Validation loss = 0.003359111025929451
Validation loss = 0.002125110477209091
Validation loss = 0.0018071902450174093
Validation loss = 0.0021351745817810297
Validation loss = 0.001987274968996644
Validation loss = 0.003519295249134302
Validation loss = 0.0027531213127076626
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004291006829589605
Validation loss = 0.007706323638558388
Validation loss = 0.0032592560164630413
Validation loss = 0.0020166803151369095
Validation loss = 0.0040878066793084145
Validation loss = 0.0019377438584342599
Validation loss = 0.0035630709026008844
Validation loss = 0.0037359807174652815
Validation loss = 0.004435793962329626
Validation loss = 0.003956922329962254
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0345   |
| Iteration     | 25        |
| MaximumReturn | -0.000607 |
| MinimumReturn | -0.727    |
| TotalSamples  | 44982     |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0018403582507744431
Validation loss = 0.0019418713636696339
Validation loss = 0.0034847387578338385
Validation loss = 0.003272728528827429
Validation loss = 0.0023328440729528666
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002216555178165436
Validation loss = 0.0030471195932477713
Validation loss = 0.0026619264390319586
Validation loss = 0.0025065848603844643
Validation loss = 0.0021993888076394796
Validation loss = 0.0028310196939855814
Validation loss = 0.0026080382522195578
Validation loss = 0.0021029342897236347
Validation loss = 0.0018966459902003407
Validation loss = 0.0021109136287122965
Validation loss = 0.0032402798533439636
Validation loss = 0.0035354711581021547
Validation loss = 0.002528337761759758
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015513725811615586
Validation loss = 0.0028321142308413982
Validation loss = 0.003327237907797098
Validation loss = 0.002923868363723159
Validation loss = 0.0025694456417113543
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0023163543082773685
Validation loss = 0.0033199975732713938
Validation loss = 0.002949113491922617
Validation loss = 0.0031857341527938843
Validation loss = 0.006371733732521534
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005133819300681353
Validation loss = 0.0018586487276479602
Validation loss = 0.0025784445460885763
Validation loss = 0.0020746728405356407
Validation loss = 0.006519499234855175
Validation loss = 0.0024000718258321285
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.18    |
| Iteration     | 26       |
| MaximumReturn | -0.0149  |
| MinimumReturn | -3.9     |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002603607950732112
Validation loss = 0.001487715053372085
Validation loss = 0.004161152523010969
Validation loss = 0.0015265524853020906
Validation loss = 0.0022648165468126535
Validation loss = 0.00210340553894639
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00571747450158
Validation loss = 0.0022501491475850344
Validation loss = 0.0020771778654307127
Validation loss = 0.004001532215625048
Validation loss = 0.0012415284290909767
Validation loss = 0.001967861782759428
Validation loss = 0.0019124106038361788
Validation loss = 0.00257113971747458
Validation loss = 0.001756374491378665
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004129800945520401
Validation loss = 0.0020115585066378117
Validation loss = 0.0021504194010049105
Validation loss = 0.002940092934295535
Validation loss = 0.0024876014795154333
Validation loss = 0.0025005151983350515
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006894403137266636
Validation loss = 0.0028798014391213655
Validation loss = 0.001764676533639431
Validation loss = 0.0033114298712462187
Validation loss = 0.001163469278253615
Validation loss = 0.0023751978296786547
Validation loss = 0.0015054261311888695
Validation loss = 0.0017264096532016993
Validation loss = 0.0022467058151960373
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0032785385847091675
Validation loss = 0.001830613473430276
Validation loss = 0.0018083826871588826
Validation loss = 0.004935235250741243
Validation loss = 0.0030045458115637302
Validation loss = 0.003413064870983362
Validation loss = 0.0020397102925926447
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0403  |
| Iteration     | 27       |
| MaximumReturn | -0.0283  |
| MinimumReturn | -0.06    |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003407929092645645
Validation loss = 0.0017560607520863414
Validation loss = 0.0017137200338765979
Validation loss = 0.0015558473533019423
Validation loss = 0.0030090820509940386
Validation loss = 0.0017712531844154
Validation loss = 0.0012960591120645404
Validation loss = 0.0012843996519222856
Validation loss = 0.002284645102918148
Validation loss = 0.0013751749647781253
Validation loss = 0.0017900391248986125
Validation loss = 0.002485154429450631
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003469096263870597
Validation loss = 0.001818881486542523
Validation loss = 0.001930708414874971
Validation loss = 0.0021216205786913633
Validation loss = 0.0020805110689252615
Validation loss = 0.0014355684397742152
Validation loss = 0.002193618332967162
Validation loss = 0.0016899737529456615
Validation loss = 0.0015938287833705544
Validation loss = 0.0014082459965720773
Validation loss = 0.0019424945348873734
Validation loss = 0.002052366966381669
Validation loss = 0.0015043435851112008
Validation loss = 0.0015160212060436606
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001644756062887609
Validation loss = 0.00210314872674644
Validation loss = 0.0025477707386016846
Validation loss = 0.0016598099609836936
Validation loss = 0.0020201110746711493
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0044234017841517925
Validation loss = 0.002029185649007559
Validation loss = 0.001370303682051599
Validation loss = 0.0013414683053269982
Validation loss = 0.0029344975482672453
Validation loss = 0.0017518805107101798
Validation loss = 0.0016655161743983626
Validation loss = 0.002415743190795183
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002787773497402668
Validation loss = 0.0017171865329146385
Validation loss = 0.0015943668549880385
Validation loss = 0.004834017250686884
Validation loss = 0.001426591887138784
Validation loss = 0.0015111143002286553
Validation loss = 0.002075426047667861
Validation loss = 0.0036812175530940294
Validation loss = 0.004000975750386715
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00631 |
| Iteration     | 28       |
| MaximumReturn | -0.0011  |
| MinimumReturn | -0.0135  |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002148131839931011
Validation loss = 0.0020444609690457582
Validation loss = 0.002150402870029211
Validation loss = 0.0019068085821345448
Validation loss = 0.002312510274350643
Validation loss = 0.0014707631198689342
Validation loss = 0.0010369009105488658
Validation loss = 0.0010865710210055113
Validation loss = 0.002541838912293315
Validation loss = 0.0025296800304204226
Validation loss = 0.0015765652060508728
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0023599816486239433
Validation loss = 0.002439617644995451
Validation loss = 0.001567751169204712
Validation loss = 0.002065539127215743
Validation loss = 0.0022164862602949142
Validation loss = 0.0017500411486253142
Validation loss = 0.0016209293389692903
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009901653975248337
Validation loss = 0.0029621741268783808
Validation loss = 0.0015760355163365602
Validation loss = 0.0020925509743392467
Validation loss = 0.0019804404582828283
Validation loss = 0.0017788044642657042
Validation loss = 0.0021681771613657475
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017674012342467904
Validation loss = 0.0011136459652334452
Validation loss = 0.0018457105616107583
Validation loss = 0.0017468204023316503
Validation loss = 0.002074422314763069
Validation loss = 0.0014415173791348934
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0038371586706489325
Validation loss = 0.0018157295417040586
Validation loss = 0.0014325600350275636
Validation loss = 0.001872028224170208
Validation loss = 0.0014037169748917222
Validation loss = 0.0012369207106530666
Validation loss = 0.002378603210672736
Validation loss = 0.002437158487737179
Validation loss = 0.0021242748480290174
Validation loss = 0.0024416290689259768
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00834 |
| Iteration     | 29       |
| MaximumReturn | -0.00115 |
| MinimumReturn | -0.0182  |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00254173600114882
Validation loss = 0.0021609379909932613
Validation loss = 0.002399626187980175
Validation loss = 0.001988933188840747
Validation loss = 0.001688938937149942
Validation loss = 0.0020605241879820824
Validation loss = 0.0019183724652975798
Validation loss = 0.0023132902570068836
Validation loss = 0.00338199594989419
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0020350145641714334
Validation loss = 0.002812259132042527
Validation loss = 0.0013353147078305483
Validation loss = 0.004757516551762819
Validation loss = 0.0013461790513247252
Validation loss = 0.0015741869574412704
Validation loss = 0.0015757596120238304
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015199336921796203
Validation loss = 0.0025304448790848255
Validation loss = 0.002539366716518998
Validation loss = 0.0014799630735069513
Validation loss = 0.0024903363082557917
Validation loss = 0.0021448044572025537
Validation loss = 0.0018285062396898866
Validation loss = 0.002096151700243354
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0021156722214072943
Validation loss = 0.0026695523411035538
Validation loss = 0.0015278049977496266
Validation loss = 0.0011782754445448518
Validation loss = 0.0016510669374838471
Validation loss = 0.0019167385762557387
Validation loss = 0.0015478080604225397
Validation loss = 0.001728750765323639
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0017137238755822182
Validation loss = 0.001925781718455255
Validation loss = 0.001619833754375577
Validation loss = 0.0013065189123153687
Validation loss = 0.0020566415041685104
Validation loss = 0.0021338423248380423
Validation loss = 0.0017005440313369036
Validation loss = 0.0028205099515616894
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00443  |
| Iteration     | 30        |
| MaximumReturn | -0.000498 |
| MinimumReturn | -0.0163   |
| TotalSamples  | 53312     |
-----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017343081999570131
Validation loss = 0.002187467413023114
Validation loss = 0.0018170730909332633
Validation loss = 0.0011593378148972988
Validation loss = 0.001409712596796453
Validation loss = 0.0013740836875513196
Validation loss = 0.0018217004835605621
Validation loss = 0.0011356219183653593
Validation loss = 0.0013346741907298565
Validation loss = 0.002687196945771575
Validation loss = 0.0014206526102498174
Validation loss = 0.0024513909593224525
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001498183235526085
Validation loss = 0.0014523942954838276
Validation loss = 0.0017790497513487935
Validation loss = 0.00261225295253098
Validation loss = 0.0024212670978158712
Validation loss = 0.003164998721331358
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0019113264279440045
Validation loss = 0.002014602068811655
Validation loss = 0.0018606684170663357
Validation loss = 0.0021881754510104656
Validation loss = 0.0015069269575178623
Validation loss = 0.0037040577735751867
Validation loss = 0.0012525334022939205
Validation loss = 0.002421781187877059
Validation loss = 0.004845875781029463
Validation loss = 0.002929302165284753
Validation loss = 0.0015047893393784761
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002049150178208947
Validation loss = 0.0029674682300537825
Validation loss = 0.003034199122339487
Validation loss = 0.0020465527195483446
Validation loss = 0.0011566574685275555
Validation loss = 0.0022539165802299976
Validation loss = 0.001664419542066753
Validation loss = 0.00238580210134387
Validation loss = 0.0014549738261848688
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0020343337673693895
Validation loss = 0.0011985251912847161
Validation loss = 0.0038940259255468845
Validation loss = 0.0020582175347954035
Validation loss = 0.0016829754458740354
Validation loss = 0.0019193317275494337
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.298    |
| Iteration     | 31        |
| MaximumReturn | -0.000562 |
| MinimumReturn | -7.37     |
| TotalSamples  | 54978     |
-----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0034592109732329845
Validation loss = 0.006766156293451786
Validation loss = 0.0026619797572493553
Validation loss = 0.0035308615770190954
Validation loss = 0.0016443110071122646
Validation loss = 0.0016583924880251288
Validation loss = 0.0028503239154815674
Validation loss = 0.002982153557240963
Validation loss = 0.0020633558742702007
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002767493948340416
Validation loss = 0.0012266716221347451
Validation loss = 0.0014347878750413656
Validation loss = 0.0019303860608488321
Validation loss = 0.0018446528119966388
Validation loss = 0.001934380386956036
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0024593258276581764
Validation loss = 0.0017677940195426345
Validation loss = 0.00532416021451354
Validation loss = 0.0025761183351278305
Validation loss = 0.0018334344495087862
Validation loss = 0.0015849252231419086
Validation loss = 0.0014773294096812606
Validation loss = 0.0015381495468318462
Validation loss = 0.003611691063269973
Validation loss = 0.0023233918473124504
Validation loss = 0.0016458488535135984
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003993954509496689
Validation loss = 0.0013371972599998116
Validation loss = 0.0012798459501937032
Validation loss = 0.0025372058153152466
Validation loss = 0.004098847974091768
Validation loss = 0.0017246095230802894
Validation loss = 0.0015663921367377043
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0019074151059612632
Validation loss = 0.008076793514192104
Validation loss = 0.00474587781354785
Validation loss = 0.0010271015344187617
Validation loss = 0.002426466904580593
Validation loss = 0.00289918202906847
Validation loss = 0.002454611239954829
Validation loss = 0.001230729161761701
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -41.8    |
| Iteration     | 32       |
| MaximumReturn | -0.0567  |
| MinimumReturn | -95.8    |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005679732654243708
Validation loss = 0.0013279563281685114
Validation loss = 0.002368714427575469
Validation loss = 0.0008791241561993957
Validation loss = 0.0011718783061951399
Validation loss = 0.0010059269843623042
Validation loss = 0.0010097863851115108
Validation loss = 0.0008571395883336663
Validation loss = 0.0011492620687931776
Validation loss = 0.0008378566126339138
Validation loss = 0.001617807662114501
Validation loss = 0.003901116084307432
Validation loss = 0.002241671085357666
Validation loss = 0.0013828428927809
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004635025281459093
Validation loss = 0.0011790570570155978
Validation loss = 0.0011086465092375875
Validation loss = 0.001397545449435711
Validation loss = 0.001886614365503192
Validation loss = 0.0007679719128645957
Validation loss = 0.0012518055737018585
Validation loss = 0.0008307307725772262
Validation loss = 0.0017239172011613846
Validation loss = 0.0012337490916252136
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.005724222399294376
Validation loss = 0.0022816143464297056
Validation loss = 0.00121127103921026
Validation loss = 0.0019609411247074604
Validation loss = 0.0017986595630645752
Validation loss = 0.000983949052169919
Validation loss = 0.0014611847000196576
Validation loss = 0.0008684118511155248
Validation loss = 0.0009617306641303003
Validation loss = 0.0013242235872894526
Validation loss = 0.0012483560713008046
Validation loss = 0.0014533280627802014
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004799754358828068
Validation loss = 0.0019067779649049044
Validation loss = 0.004409187939018011
Validation loss = 0.0010992755414918065
Validation loss = 0.001981213456019759
Validation loss = 0.001876467140391469
Validation loss = 0.0011897620279341936
Validation loss = 0.0015963336918503046
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005825801286846399
Validation loss = 0.0028296939563006163
Validation loss = 0.0021314178593456745
Validation loss = 0.003923335578292608
Validation loss = 0.0009272248717024922
Validation loss = 0.0012969777453690767
Validation loss = 0.0011784693924710155
Validation loss = 0.0010339174186810851
Validation loss = 0.0012025550240650773
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -4.7      |
| Iteration     | 33        |
| MaximumReturn | -0.000566 |
| MinimumReturn | -60.4     |
| TotalSamples  | 58310     |
-----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006159095792099833
Validation loss = 0.0012348382733762264
Validation loss = 0.001933582010678947
Validation loss = 0.000927161832805723
Validation loss = 0.0008845179108902812
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017138823168352246
Validation loss = 0.00264929817058146
Validation loss = 0.0012284321710467339
Validation loss = 0.000655663083307445
Validation loss = 0.0008536194218322635
Validation loss = 0.0011532221687957644
Validation loss = 0.0012559029273688793
Validation loss = 0.0009506272035650909
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0027946578338742256
Validation loss = 0.001413620077073574
Validation loss = 0.002785904100164771
Validation loss = 0.0012793490896001458
Validation loss = 0.0012682934757322073
Validation loss = 0.0016324975294992328
Validation loss = 0.0007329675718210638
Validation loss = 0.0021857465617358685
Validation loss = 0.0034803152084350586
Validation loss = 0.0011346233077347279
Validation loss = 0.0014360847417265177
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008681381586939096
Validation loss = 0.0010684150038287044
Validation loss = 0.0007500625215470791
Validation loss = 0.002684394596144557
Validation loss = 0.0013208737364038825
Validation loss = 0.0009451063233427703
Validation loss = 0.000977716059423983
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003038224996998906
Validation loss = 0.0013348330976441503
Validation loss = 0.001147621194832027
Validation loss = 0.0011810066644102335
Validation loss = 0.0009508039802312851
Validation loss = 0.000612332543823868
Validation loss = 0.004296730738133192
Validation loss = 0.0007925571990199387
Validation loss = 0.001117914798669517
Validation loss = 0.0011675239074975252
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2        |
| Iteration     | 34        |
| MaximumReturn | -0.000527 |
| MinimumReturn | -44.8     |
| TotalSamples  | 59976     |
-----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009678811184130609
Validation loss = 0.0014705214416608214
Validation loss = 0.0008311050478368998
Validation loss = 0.0023805995006114244
Validation loss = 0.0011969145853072405
Validation loss = 0.009103771299123764
Validation loss = 0.0006856151740066707
Validation loss = 0.000696260598488152
Validation loss = 0.0007936841575428843
Validation loss = 0.0008218537550419569
Validation loss = 0.0006236741901375353
Validation loss = 0.001312417909502983
Validation loss = 0.0019564975518733263
Validation loss = 0.0008870800957083702
Validation loss = 0.0009710422018542886
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000798561901319772
Validation loss = 0.0009146520751528442
Validation loss = 0.0008356961770914495
Validation loss = 0.0010577215580269694
Validation loss = 0.0010177692165598273
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022714626975357533
Validation loss = 0.0015136782312765718
Validation loss = 0.0008162107551470399
Validation loss = 0.0010044616647064686
Validation loss = 0.0007468935218639672
Validation loss = 0.004160949029028416
Validation loss = 0.0030492055229842663
Validation loss = 0.0011616147821769118
Validation loss = 0.0006160799530334771
Validation loss = 0.0022524273954331875
Validation loss = 0.0006667889538221061
Validation loss = 0.000851432210765779
Validation loss = 0.0015070454683154821
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010382951004430652
Validation loss = 0.0012610794510692358
Validation loss = 0.0021909710485488176
Validation loss = 0.002251733560115099
Validation loss = 0.001347303157672286
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009182296344079077
Validation loss = 0.0007088413694873452
Validation loss = 0.002544824266806245
Validation loss = 0.0008594085811637342
Validation loss = 0.0013711868086829782
Validation loss = 0.0007529578288085759
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0147   |
| Iteration     | 35        |
| MaximumReturn | -0.000578 |
| MinimumReturn | -0.0805   |
| TotalSamples  | 61642     |
-----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012353359488770366
Validation loss = 0.001415490871295333
Validation loss = 0.0013675411464646459
Validation loss = 0.0010989762376993895
Validation loss = 0.0007434524013660848
Validation loss = 0.0008756773313507438
Validation loss = 0.0006736134528182447
Validation loss = 0.0017061561811715364
Validation loss = 0.0033573757391422987
Validation loss = 0.0006012932281009853
Validation loss = 0.0011179239954799414
Validation loss = 0.0011651871027424932
Validation loss = 0.0020988413598388433
Validation loss = 0.0009418701520189643
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006869642995297909
Validation loss = 0.0009710804442875087
Validation loss = 0.0020990828052163124
Validation loss = 0.0023019558284431696
Validation loss = 0.0014770623529329896
Validation loss = 0.0008325903327204287
Validation loss = 0.0014208171050995588
Validation loss = 0.0008331651333719492
Validation loss = 0.0013317229459062219
Validation loss = 0.0009489238145761192
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013926369138062
Validation loss = 0.001397213782183826
Validation loss = 0.0012233451707288623
Validation loss = 0.0008886244613677263
Validation loss = 0.0009928317740559578
Validation loss = 0.0009755626088008285
Validation loss = 0.001346117933280766
Validation loss = 0.001558778341859579
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015175041044130921
Validation loss = 0.001014759298413992
Validation loss = 0.0011612247908487916
Validation loss = 0.0023575113154947758
Validation loss = 0.0015083210309967399
Validation loss = 0.0010103737004101276
Validation loss = 0.0026273815892636776
Validation loss = 0.0012069393415004015
Validation loss = 0.0016344026662409306
Validation loss = 0.0010903815273195505
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007992785540409386
Validation loss = 0.0015566961374133825
Validation loss = 0.0009347734740003943
Validation loss = 0.0010417564772069454
Validation loss = 0.00231449818238616
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.17    |
| Iteration     | 36       |
| MaximumReturn | -0.00064 |
| MinimumReturn | -29      |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013345569605007768
Validation loss = 0.0010364832123741508
Validation loss = 0.0007730310317128897
Validation loss = 0.0012328957673162222
Validation loss = 0.0025882727932184935
Validation loss = 0.001003122073598206
Validation loss = 0.0007441503694280982
Validation loss = 0.0019076865864917636
Validation loss = 0.0005702756461687386
Validation loss = 0.0007108385325409472
Validation loss = 0.0016560240183025599
Validation loss = 0.0027622997295111418
Validation loss = 0.0006701778620481491
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013193351915106177
Validation loss = 0.0011279347818344831
Validation loss = 0.0006615797174163163
Validation loss = 0.0014818438794463873
Validation loss = 0.0010009523248299956
Validation loss = 0.001646604621782899
Validation loss = 0.0013170952443033457
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0015967339277267456
Validation loss = 0.001567624625749886
Validation loss = 0.0008982448489405215
Validation loss = 0.0014306659577414393
Validation loss = 0.0011394540779292583
Validation loss = 0.0008291451958939433
Validation loss = 0.0015504969051107764
Validation loss = 0.0032099320087581873
Validation loss = 0.0012105479836463928
Validation loss = 0.0013752705417573452
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011614670511335135
Validation loss = 0.0012096688151359558
Validation loss = 0.0011733764549717307
Validation loss = 0.0006384729058481753
Validation loss = 0.0007313358364626765
Validation loss = 0.0012749886373057961
Validation loss = 0.0008478931267745793
Validation loss = 0.0023366385139524937
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0023095884826034307
Validation loss = 0.001728762872517109
Validation loss = 0.0008421126985922456
Validation loss = 0.0020144041627645493
Validation loss = 0.0009411690989509225
Validation loss = 0.0007802979089319706
Validation loss = 0.0021567088551819324
Validation loss = 0.0005883204867132008
Validation loss = 0.0011850327719002962
Validation loss = 0.0009552460396662354
Validation loss = 0.001054386142641306
Validation loss = 0.0008679605671204627
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -4.79     |
| Iteration     | 37        |
| MaximumReturn | -0.000803 |
| MinimumReturn | -42.1     |
| TotalSamples  | 64974     |
-----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006710551679134369
Validation loss = 0.0006947735673747957
Validation loss = 0.0007870300323702395
Validation loss = 0.0005333240842446685
Validation loss = 0.0022801540326327085
Validation loss = 0.0008146569016389549
Validation loss = 0.0007781858439557254
Validation loss = 0.004185243509709835
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014716049190610647
Validation loss = 0.002465930301696062
Validation loss = 0.0011651857057586312
Validation loss = 0.0023531694896519184
Validation loss = 0.001084274728782475
Validation loss = 0.0007437318563461304
Validation loss = 0.0052144150249660015
Validation loss = 0.000746528385207057
Validation loss = 0.0005096596432849765
Validation loss = 0.0006041821325197816
Validation loss = 0.0008139077690429986
Validation loss = 0.003877810901030898
Validation loss = 0.0012651362922042608
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003034633584320545
Validation loss = 0.0006046044290997088
Validation loss = 0.0009729726007208228
Validation loss = 0.0015923508908599615
Validation loss = 0.0012674229219555855
Validation loss = 0.0019920922350138426
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0013233090285211802
Validation loss = 0.0007775825797580183
Validation loss = 0.0006628393894061446
Validation loss = 0.0012539145536720753
Validation loss = 0.0015951709356158972
Validation loss = 0.0012461397564038634
Validation loss = 0.0016045207157731056
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001109811943024397
Validation loss = 0.0008764397352933884
Validation loss = 0.002783041214570403
Validation loss = 0.0012924124021083117
Validation loss = 0.001219776924699545
Validation loss = 0.0009116142755374312
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -25.4     |
| Iteration     | 38        |
| MaximumReturn | -0.000822 |
| MinimumReturn | -103      |
| TotalSamples  | 66640     |
-----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017329539405182004
Validation loss = 0.0019225319847464561
Validation loss = 0.0008622657624073327
Validation loss = 0.0011551303323358297
Validation loss = 0.0009069845546036959
Validation loss = 0.0017863407265394926
Validation loss = 0.0008056187652982771
Validation loss = 0.0008217930444516242
Validation loss = 0.0011601177975535393
Validation loss = 0.001021415344439447
Validation loss = 0.0013658811803907156
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004064369015395641
Validation loss = 0.0015165430959314108
Validation loss = 0.0008012878824956715
Validation loss = 0.0010731022339314222
Validation loss = 0.0008052075281739235
Validation loss = 0.00150197371840477
Validation loss = 0.000621130398940295
Validation loss = 0.0007286827312782407
Validation loss = 0.0015311520546674728
Validation loss = 0.0006787038873881102
Validation loss = 0.0020259064622223377
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0035769366659224033
Validation loss = 0.0017889213049784303
Validation loss = 0.0012841885909438133
Validation loss = 0.0018159284954890609
Validation loss = 0.0013172064209356904
Validation loss = 0.0015489031793549657
Validation loss = 0.0022777197882533073
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009208059636875987
Validation loss = 0.0006513883708976209
Validation loss = 0.0018532996764406562
Validation loss = 0.0008896566578187048
Validation loss = 0.001996984239667654
Validation loss = 0.0010984534164890647
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003143415553495288
Validation loss = 0.0008622889872640371
Validation loss = 0.0007471096469089389
Validation loss = 0.0008698539459146559
Validation loss = 0.0011334220180287957
Validation loss = 0.0015614187577739358
Validation loss = 0.0019262979039922357
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3.68     |
| Iteration     | 39        |
| MaximumReturn | -0.000634 |
| MinimumReturn | -91.2     |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006607342511415482
Validation loss = 0.0010055680759251118
Validation loss = 0.001134546590037644
Validation loss = 0.001097679603844881
Validation loss = 0.0006486997008323669
Validation loss = 0.0005869238520972431
Validation loss = 0.0008127018227241933
Validation loss = 0.0016261201817542315
Validation loss = 0.0011801774380728602
Validation loss = 0.0014726584777235985
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001232239417731762
Validation loss = 0.0006547309458255768
Validation loss = 0.000946204352658242
Validation loss = 0.0011322092032060027
Validation loss = 0.0015140138566493988
Validation loss = 0.0005714209983125329
Validation loss = 0.0008270943653769791
Validation loss = 0.0013954875757917762
Validation loss = 0.0015339384553954005
Validation loss = 0.0008164852624759078
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016079144552350044
Validation loss = 0.0007305119070224464
Validation loss = 0.0008502314449287951
Validation loss = 0.001048763282597065
Validation loss = 0.0011465303832665086
Validation loss = 0.0008787253755144775
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008459101663902402
Validation loss = 0.0008723651408217847
Validation loss = 0.0016127391718328
Validation loss = 0.0013400816824287176
Validation loss = 0.0016725542955100536
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015044126193970442
Validation loss = 0.000661092228256166
Validation loss = 0.0008988560293801129
Validation loss = 0.002032841555774212
Validation loss = 0.002577481558546424
Validation loss = 0.0009079419542104006
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3.38     |
| Iteration     | 40        |
| MaximumReturn | -0.000538 |
| MinimumReturn | -82.6     |
| TotalSamples  | 69972     |
-----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021366390865296125
Validation loss = 0.0009005216415971518
Validation loss = 0.001798319397494197
Validation loss = 0.0006897005368955433
Validation loss = 0.0013712022919207811
Validation loss = 0.0010455107549205422
Validation loss = 0.0016496002208441496
Validation loss = 0.001196283265016973
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009199908818118274
Validation loss = 0.0007421794580295682
Validation loss = 0.0008818447240628302
Validation loss = 0.0013997697969898582
Validation loss = 0.0012748558074235916
Validation loss = 0.003249767003580928
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000989031745120883
Validation loss = 0.0009634621674194932
Validation loss = 0.004602157510817051
Validation loss = 0.0005319183110259473
Validation loss = 0.002121506491675973
Validation loss = 0.001122953835874796
Validation loss = 0.0010040821507573128
Validation loss = 0.0010507672559469938
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0014497308293357491
Validation loss = 0.0011474243365228176
Validation loss = 0.0008925723377615213
Validation loss = 0.000664428633172065
Validation loss = 0.0007198550156317651
Validation loss = 0.0007770878146402538
Validation loss = 0.0017573487712070346
Validation loss = 0.000655311974696815
Validation loss = 0.0008607131312601268
Validation loss = 0.0011443204712122679
Validation loss = 0.0011032131733372808
Validation loss = 0.0012559904716908932
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015795674407854676
Validation loss = 0.0011530298506841063
Validation loss = 0.001110760378651321
Validation loss = 0.0009272591560147703
Validation loss = 0.0006641748477704823
Validation loss = 0.0014252288965508342
Validation loss = 0.0014357974287122488
Validation loss = 0.0010374453850090504
Validation loss = 0.001275476417504251
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -4.68     |
| Iteration     | 41        |
| MaximumReturn | -0.000573 |
| MinimumReturn | -87.7     |
| TotalSamples  | 71638     |
-----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0014394255122169852
Validation loss = 0.0009631310240365565
Validation loss = 0.00134961714502424
Validation loss = 0.0007248898036777973
Validation loss = 0.0016677042003720999
Validation loss = 0.0011725580552592874
Validation loss = 0.0008328508702106774
Validation loss = 0.0007418726454488933
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013373529072850943
Validation loss = 0.01260402426123619
Validation loss = 0.0009646670077927411
Validation loss = 0.0010475957533344626
Validation loss = 0.0010981005616486073
Validation loss = 0.0015469810459762812
Validation loss = 0.0006696402560919523
Validation loss = 0.0005987393669784069
Validation loss = 0.0011034958297386765
Validation loss = 0.0006817778339609504
Validation loss = 0.0008790812571533024
Validation loss = 0.003981107845902443
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0030428413301706314
Validation loss = 0.001421885215677321
Validation loss = 0.0012708688154816628
Validation loss = 0.0009966138750314713
Validation loss = 0.0011898778611794114
Validation loss = 0.000671768153551966
Validation loss = 0.0009390741470269859
Validation loss = 0.0008542885188944638
Validation loss = 0.0011396761983633041
Validation loss = 0.0010760331060737371
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010210175532847643
Validation loss = 0.001132603851146996
Validation loss = 0.0010192531626671553
Validation loss = 0.0011087983148172498
Validation loss = 0.0015789037570357323
Validation loss = 0.0009834031807258725
Validation loss = 0.0009976894361898303
Validation loss = 0.0007534163887612522
Validation loss = 0.0007291501387953758
Validation loss = 0.0015195069136098027
Validation loss = 0.0013676689704880118
Validation loss = 0.0014769058907404542
Validation loss = 0.002608822425827384
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010480599012225866
Validation loss = 0.0010850367834791541
Validation loss = 0.0025242038536816835
Validation loss = 0.000899375241715461
Validation loss = 0.0008464805432595313
Validation loss = 0.0009221059735864401
Validation loss = 0.0034955141600221395
Validation loss = 0.0010747170308604836
Validation loss = 0.0008563282899558544
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -15       |
| Iteration     | 42        |
| MaximumReturn | -0.000745 |
| MinimumReturn | -100      |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013717053225263953
Validation loss = 0.0009045781916938722
Validation loss = 0.0011372395092621446
Validation loss = 0.00173145008739084
Validation loss = 0.0010463036596775055
Validation loss = 0.0012621528003364801
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0013841280015185475
Validation loss = 0.0010277420515194535
Validation loss = 0.0010012664133682847
Validation loss = 0.000766809331253171
Validation loss = 0.0008334991871379316
Validation loss = 0.0008680617902427912
Validation loss = 0.0008066652808338404
Validation loss = 0.000732238928321749
Validation loss = 0.0013706076424568892
Validation loss = 0.0015732691390439868
Validation loss = 0.0007809560047462583
Validation loss = 0.0011917942902073264
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008542391587980092
Validation loss = 0.002514679916203022
Validation loss = 0.0008901812834665179
Validation loss = 0.0007585947751067579
Validation loss = 0.001257842406630516
Validation loss = 0.0011585620231926441
Validation loss = 0.001335473032668233
Validation loss = 0.000623519707005471
Validation loss = 0.0018539311131462455
Validation loss = 0.0017251177923753858
Validation loss = 0.0020543367136269808
Validation loss = 0.0013231753837317228
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001643274910748005
Validation loss = 0.0016279793344438076
Validation loss = 0.0007560967933386564
Validation loss = 0.0026582900900393724
Validation loss = 0.0013702866854146123
Validation loss = 0.0008118085679598153
Validation loss = 0.0021245491225272417
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015093835536390543
Validation loss = 0.0013059935299679637
Validation loss = 0.0013629418099299073
Validation loss = 0.0009042487945407629
Validation loss = 0.0009561209008097649
Validation loss = 0.0009544455679133534
Validation loss = 0.0009601977071724832
Validation loss = 0.0011046476429328322
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -57.9    |
| Iteration     | 43       |
| MaximumReturn | -0.0946  |
| MinimumReturn | -101     |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0068903155624866486
Validation loss = 0.0010101055959239602
Validation loss = 0.0008786468533799052
Validation loss = 0.0008566665346734226
Validation loss = 0.0008985365857370198
Validation loss = 0.0006561345071531832
Validation loss = 0.001168970251455903
Validation loss = 0.0005775182507932186
Validation loss = 0.0004956972552463412
Validation loss = 0.000999883166514337
Validation loss = 0.0009347092709504068
Validation loss = 0.0007119728252291679
Validation loss = 0.0006629227427765727
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004508510231971741
Validation loss = 0.0009398511610925198
Validation loss = 0.0018139103194698691
Validation loss = 0.0009320848621428013
Validation loss = 0.0011906117433682084
Validation loss = 0.0007710233330726624
Validation loss = 0.0006110816611908376
Validation loss = 0.0010401317849755287
Validation loss = 0.0009094116394408047
Validation loss = 0.0009064425830729306
Validation loss = 0.0005805606488138437
Validation loss = 0.003837833646684885
Validation loss = 0.0009338305098935962
Validation loss = 0.0008898312808014452
Validation loss = 0.0009486922062933445
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0026605615857988596
Validation loss = 0.0011501158587634563
Validation loss = 0.0011614388786256313
Validation loss = 0.0011669534724205732
Validation loss = 0.0014900289243087173
Validation loss = 0.0009587529930286109
Validation loss = 0.0009603204089216888
Validation loss = 0.0008272301056422293
Validation loss = 0.0011331916321069002
Validation loss = 0.0012774118222296238
Validation loss = 0.0012202400248497725
Validation loss = 0.0010117330821231008
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004179322626441717
Validation loss = 0.0013929903507232666
Validation loss = 0.002131120767444372
Validation loss = 0.0008091435302048922
Validation loss = 0.000892054580617696
Validation loss = 0.0008286930387839675
Validation loss = 0.0008751244749873877
Validation loss = 0.0008193554822355509
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004987530875951052
Validation loss = 0.0014203805476427078
Validation loss = 0.001384919392876327
Validation loss = 0.000881653802935034
Validation loss = 0.0016188631998375058
Validation loss = 0.0007931952713988721
Validation loss = 0.001584632322192192
Validation loss = 0.002311436226591468
Validation loss = 0.0012872501974925399
Validation loss = 0.0012834814842790365
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -24.2     |
| Iteration     | 44        |
| MaximumReturn | -0.000956 |
| MinimumReturn | -76.3     |
| TotalSamples  | 76636     |
-----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002784454496577382
Validation loss = 0.0011324674123898149
Validation loss = 0.001163568696938455
Validation loss = 0.0019809736404567957
Validation loss = 0.0009554551215842366
Validation loss = 0.0009278820944018662
Validation loss = 0.0008497590897604823
Validation loss = 0.0005745993694290519
Validation loss = 0.001187270157970488
Validation loss = 0.0006935626151971519
Validation loss = 0.0008603185415267944
Validation loss = 0.0005291158449836075
Validation loss = 0.0010052380384877324
Validation loss = 0.0005693552084267139
Validation loss = 0.0006213280721567571
Validation loss = 0.0008217875729314983
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003822687081992626
Validation loss = 0.0013963558012619615
Validation loss = 0.0018684216775000095
Validation loss = 0.000786690681707114
Validation loss = 0.0008354976307600737
Validation loss = 0.0009900399018079042
Validation loss = 0.001078536151908338
Validation loss = 0.0007220397819764912
Validation loss = 0.001214972697198391
Validation loss = 0.0006953775882720947
Validation loss = 0.001166288391686976
Validation loss = 0.0009418681729584932
Validation loss = 0.0008458969532512128
Validation loss = 0.0012693408643826842
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003050748025998473
Validation loss = 0.0008764029480516911
Validation loss = 0.0009044785983860493
Validation loss = 0.001353452098555863
Validation loss = 0.0008441381505690515
Validation loss = 0.000883055676240474
Validation loss = 0.0006978589808568358
Validation loss = 0.0010778172872960567
Validation loss = 0.000938256096560508
Validation loss = 0.0009072160464711487
Validation loss = 0.0012592923594638705
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002613218268379569
Validation loss = 0.0008558875997550786
Validation loss = 0.0008069253526628017
Validation loss = 0.001597579219378531
Validation loss = 0.0006909784860908985
Validation loss = 0.0012690236326307058
Validation loss = 0.000628382433205843
Validation loss = 0.0008766782702878118
Validation loss = 0.0015475646359845996
Validation loss = 0.0009237053454853594
Validation loss = 0.0008948014001362026
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002095245523378253
Validation loss = 0.0009237541817128658
Validation loss = 0.0018935613334178925
Validation loss = 0.000802170077804476
Validation loss = 0.0008099812548607588
Validation loss = 0.0006265894044190645
Validation loss = 0.0008021644316613674
Validation loss = 0.001025915495119989
Validation loss = 0.0009685968398116529
Validation loss = 0.001189886825159192
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -10.9     |
| Iteration     | 45        |
| MaximumReturn | -0.000803 |
| MinimumReturn | -55.1     |
| TotalSamples  | 78302     |
-----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008734920993447304
Validation loss = 0.0010994496988132596
Validation loss = 0.0005823734099976718
Validation loss = 0.001172813237644732
Validation loss = 0.0009690742590464652
Validation loss = 0.0005486946902237833
Validation loss = 0.0005784754757769406
Validation loss = 0.0007247737376019359
Validation loss = 0.0008688345551490784
Validation loss = 0.0008093154174275696
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010107753332704306
Validation loss = 0.0008026062278077006
Validation loss = 0.000893614545930177
Validation loss = 0.0008102393476292491
Validation loss = 0.0008660011226311326
Validation loss = 0.0008203360484912992
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008768081315793097
Validation loss = 0.0012793873902410269
Validation loss = 0.0016937508480623364
Validation loss = 0.0008726197411306202
Validation loss = 0.0005901196855120361
Validation loss = 0.0008471945184282959
Validation loss = 0.0009980278555303812
Validation loss = 0.0009568512323312461
Validation loss = 0.004492028616368771
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007346540223807096
Validation loss = 0.0007541092927567661
Validation loss = 0.0009162239730358124
Validation loss = 0.0006322739645838737
Validation loss = 0.0007378735463134944
Validation loss = 0.000822538451757282
Validation loss = 0.0010275592794641852
Validation loss = 0.0013423945056274533
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008365377434529364
Validation loss = 0.0009852080838754773
Validation loss = 0.0022873813286423683
Validation loss = 0.0012107545044273138
Validation loss = 0.0007974359905347228
Validation loss = 0.0014358964981511235
Validation loss = 0.0013141881208866835
Validation loss = 0.001011822372674942
Validation loss = 0.0008068613824434578
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.504    |
| Iteration     | 46        |
| MaximumReturn | -0.000538 |
| MinimumReturn | -12.5     |
| TotalSamples  | 79968     |
-----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013102056691423059
Validation loss = 0.0013416880974546075
Validation loss = 0.000526994583196938
Validation loss = 0.000800520007032901
Validation loss = 0.0011383008677512407
Validation loss = 0.0011407982092350721
Validation loss = 0.0006093000993132591
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017059193924069405
Validation loss = 0.0006243022507987916
Validation loss = 0.0006258807261474431
Validation loss = 0.0008153224480338395
Validation loss = 0.0013063906226307154
Validation loss = 0.0006223408272489905
Validation loss = 0.0007194061763584614
Validation loss = 0.0007287681801244617
Validation loss = 0.0017437757924199104
Validation loss = 0.0006591840647161007
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008004771661944687
Validation loss = 0.0010697992984205484
Validation loss = 0.001318588969297707
Validation loss = 0.0018366968724876642
Validation loss = 0.0006215990870259702
Validation loss = 0.0014740682672709227
Validation loss = 0.0006951362011022866
Validation loss = 0.0011026685824617743
Validation loss = 0.0007495783502236009
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010191265027970076
Validation loss = 0.001049750135280192
Validation loss = 0.0010187783045694232
Validation loss = 0.0008134270901791751
Validation loss = 0.0010295037645846605
Validation loss = 0.0009355574729852378
Validation loss = 0.0007314219838008285
Validation loss = 0.0007206353475339711
Validation loss = 0.0009248430724255741
Validation loss = 0.0007845631917007267
Validation loss = 0.0009911172091960907
Validation loss = 0.0008589419303461909
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007251561619341373
Validation loss = 0.0008494288777001202
Validation loss = 0.0011833326425403357
Validation loss = 0.0012706697452813387
Validation loss = 0.0009874885436147451
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.09     |
| Iteration     | 47        |
| MaximumReturn | -0.000655 |
| MinimumReturn | -43.2     |
| TotalSamples  | 81634     |
-----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008402559906244278
Validation loss = 0.000526677118614316
Validation loss = 0.0005998355336487293
Validation loss = 0.0006809149635955691
Validation loss = 0.0011288202367722988
Validation loss = 0.0005387685960158706
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009780677501112223
Validation loss = 0.0007036854512989521
Validation loss = 0.0027778842486441135
Validation loss = 0.0007303690654225647
Validation loss = 0.0009473810787312686
Validation loss = 0.001492519280873239
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006345176370814443
Validation loss = 0.0005549892666749656
Validation loss = 0.0007181176915764809
Validation loss = 0.0006467296625487506
Validation loss = 0.0011329611297696829
Validation loss = 0.001237890450283885
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009857381228357553
Validation loss = 0.0007079295464791358
Validation loss = 0.0007253675139509141
Validation loss = 0.0009407534962520003
Validation loss = 0.0010115902405232191
Validation loss = 0.0008041366818360984
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010452000424265862
Validation loss = 0.0009744515409693122
Validation loss = 0.0014739376492798328
Validation loss = 0.0009354948997497559
Validation loss = 0.000841896515339613
Validation loss = 0.0014546802267432213
Validation loss = 0.0013066389365121722
Validation loss = 0.0007244894513860345
Validation loss = 0.0016237434465438128
Validation loss = 0.0007374515407718718
Validation loss = 0.0007101214723661542
Validation loss = 0.0005771695869043469
Validation loss = 0.0022425937931984663
Validation loss = 0.0010365095222368836
Validation loss = 0.0009109621751122177
Validation loss = 0.0015218723565340042
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.88     |
| Iteration     | 48        |
| MaximumReturn | -0.000568 |
| MinimumReturn | -46.7     |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003186994232237339
Validation loss = 0.0012033495586365461
Validation loss = 0.0010137432254850864
Validation loss = 0.0004563676775433123
Validation loss = 0.0007096437038853765
Validation loss = 0.000761552422773093
Validation loss = 0.0006319030071608722
Validation loss = 0.002955315401777625
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010445797815918922
Validation loss = 0.0008622989407740533
Validation loss = 0.0008834417094476521
Validation loss = 0.0009096479043364525
Validation loss = 0.000890097813680768
Validation loss = 0.0008682469488121569
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006432324880734086
Validation loss = 0.0007209965260699391
Validation loss = 0.0011393543099984527
Validation loss = 0.0014015979832038283
Validation loss = 0.001696186256594956
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007771218079142272
Validation loss = 0.0009854297386482358
Validation loss = 0.0010499297641217709
Validation loss = 0.001305800979025662
Validation loss = 0.0007372227846644819
Validation loss = 0.0013850732939317822
Validation loss = 0.000778636836912483
Validation loss = 0.0006912049720995128
Validation loss = 0.001218310440890491
Validation loss = 0.001740829087793827
Validation loss = 0.0006939625018276274
Validation loss = 0.0011966051533818245
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011134055675938725
Validation loss = 0.003576674498617649
Validation loss = 0.0010113157331943512
Validation loss = 0.0018870763015002012
Validation loss = 0.0012687538983300328
Validation loss = 0.0007108693243935704
Validation loss = 0.0009278372745029628
Validation loss = 0.001306062564253807
Validation loss = 0.0010649936739355326
Validation loss = 0.0016192418988794088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -4.23     |
| Iteration     | 49        |
| MaximumReturn | -0.000576 |
| MinimumReturn | -87.2     |
| TotalSamples  | 84966     |
-----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015419501578435302
Validation loss = 0.0006966976798139513
Validation loss = 0.0014919398818165064
Validation loss = 0.0009088384103961289
Validation loss = 0.0005645495257340372
Validation loss = 0.0009674386237747967
Validation loss = 0.0007708985358476639
Validation loss = 0.000814806146081537
Validation loss = 0.002011550823226571
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007256152457557619
Validation loss = 0.0008561089634895325
Validation loss = 0.0005386973498389125
Validation loss = 0.0007736415136605501
Validation loss = 0.0009576579905115068
Validation loss = 0.0012884554453194141
Validation loss = 0.000690220040269196
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002016911981627345
Validation loss = 0.0017225270858034492
Validation loss = 0.0008021456305868924
Validation loss = 0.0011709054233506322
Validation loss = 0.0014280756004154682
Validation loss = 0.002771336119621992
Validation loss = 0.0007601065444760025
Validation loss = 0.0005284007056616247
Validation loss = 0.0007201003609225154
Validation loss = 0.0011287187226116657
Validation loss = 0.0010235413210466504
Validation loss = 0.0009136841399595141
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001213448471389711
Validation loss = 0.0007690149359405041
Validation loss = 0.0010354568948969245
Validation loss = 0.0014473007759079337
Validation loss = 0.0010441126069054008
Validation loss = 0.0005306547973304987
Validation loss = 0.0007447480456903577
Validation loss = 0.0010619012173265219
Validation loss = 0.0006590401171706617
Validation loss = 0.0010005732765421271
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0011757255997508764
Validation loss = 0.0016961114015430212
Validation loss = 0.0010086910333484411
Validation loss = 0.003035137429833412
Validation loss = 0.0009368832688778639
Validation loss = 0.001149105024524033
Validation loss = 0.0007325117476284504
Validation loss = 0.0007520934450440109
Validation loss = 0.0007959915674291551
Validation loss = 0.001162945874966681
Validation loss = 0.0010043730726465583
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.54    |
| Iteration     | 50       |
| MaximumReturn | -0.00065 |
| MinimumReturn | -97      |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0015006782487034798
Validation loss = 0.0006505794590339065
Validation loss = 0.001608095015399158
Validation loss = 0.0018503721803426743
Validation loss = 0.0005964056472294033
Validation loss = 0.0005822714883834124
Validation loss = 0.000825889001134783
Validation loss = 0.0014059507520869374
Validation loss = 0.0005651923129335046
Validation loss = 0.0011178032727912068
Validation loss = 0.0006883537280373275
Validation loss = 0.0008801969233900309
Validation loss = 0.0007836096337996423
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007259665289893746
Validation loss = 0.0006125564686954021
Validation loss = 0.0006287764408625662
Validation loss = 0.0006010510842315853
Validation loss = 0.0007775789708830416
Validation loss = 0.000620996521320194
Validation loss = 0.0014120697742328048
Validation loss = 0.0009954468114301562
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006437718402594328
Validation loss = 0.0021946735214442015
Validation loss = 0.0008135024108923972
Validation loss = 0.0013147036079317331
Validation loss = 0.0006829177727922797
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007226385059766471
Validation loss = 0.000768434489145875
Validation loss = 0.0008776563918218017
Validation loss = 0.0005310161504894495
Validation loss = 0.0006169531843625009
Validation loss = 0.0024676048196852207
Validation loss = 0.000778543297201395
Validation loss = 0.0010771978413686156
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009307131986133754
Validation loss = 0.0011523111024871469
Validation loss = 0.0012329184683039784
Validation loss = 0.0021144323982298374
Validation loss = 0.0006520181777887046
Validation loss = 0.0007797739817760885
Validation loss = 0.0007071351865306497
Validation loss = 0.0010682527208700776
Validation loss = 0.000896267534699291
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.24    |
| Iteration     | 51       |
| MaximumReturn | -0.00053 |
| MinimumReturn | -55.8    |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008482238044962287
Validation loss = 0.000806452299002558
Validation loss = 0.0010737949050962925
Validation loss = 0.0014453823678195477
Validation loss = 0.0006058302242308855
Validation loss = 0.0009029111242853105
Validation loss = 0.0005741761415265501
Validation loss = 0.0007989087607711554
Validation loss = 0.001152103883214295
Validation loss = 0.0009944797493517399
Validation loss = 0.000854914600495249
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001099735964089632
Validation loss = 0.0007657516398467124
Validation loss = 0.00136307324282825
Validation loss = 0.0010145739652216434
Validation loss = 0.0006161517230793834
Validation loss = 0.0005745272501371801
Validation loss = 0.0010077746119350195
Validation loss = 0.000870831310749054
Validation loss = 0.0010205890284851193
Validation loss = 0.0007866050582379103
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000801372283603996
Validation loss = 0.0009694444597698748
Validation loss = 0.000758355890866369
Validation loss = 0.001590317813679576
Validation loss = 0.0007914361194707453
Validation loss = 0.0010685649467632174
Validation loss = 0.0009926572674885392
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005682564224116504
Validation loss = 0.001171932090073824
Validation loss = 0.0009423411102034152
Validation loss = 0.0008403686806559563
Validation loss = 0.0008234548731707036
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008386012632399797
Validation loss = 0.0009737398358993232
Validation loss = 0.0009746818104758859
Validation loss = 0.0016341524897143245
Validation loss = 0.000543489761184901
Validation loss = 0.001321419607847929
Validation loss = 0.0016132317250594497
Validation loss = 0.0011895138304680586
Validation loss = 0.00098216044716537
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -8.1      |
| Iteration     | 52        |
| MaximumReturn | -0.000555 |
| MinimumReturn | -96.7     |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007161237299442291
Validation loss = 0.0009368038736283779
Validation loss = 0.0009300563251599669
Validation loss = 0.0006134204450063407
Validation loss = 0.0016717624384909868
Validation loss = 0.0009770834585651755
Validation loss = 0.0011610997607931495
Validation loss = 0.0006411218200810254
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010441066697239876
Validation loss = 0.0007074325694702566
Validation loss = 0.001750718685798347
Validation loss = 0.0005172226228751242
Validation loss = 0.0007995205814950168
Validation loss = 0.0009219018975272775
Validation loss = 0.0012051856610924006
Validation loss = 0.0005866658175364137
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006622575456276536
Validation loss = 0.0006247159326449037
Validation loss = 0.002383682643994689
Validation loss = 0.0008875484927557409
Validation loss = 0.0007603639387525618
Validation loss = 0.0010292204096913338
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010310234501957893
Validation loss = 0.0007267604232765734
Validation loss = 0.005185091868042946
Validation loss = 0.0016092172591015697
Validation loss = 0.0007067801197990775
Validation loss = 0.0005868321168236434
Validation loss = 0.002768077189102769
Validation loss = 0.0011954494984820485
Validation loss = 0.0006394486990757287
Validation loss = 0.0013755252584815025
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007048939587548375
Validation loss = 0.0010126902488991618
Validation loss = 0.001567975152283907
Validation loss = 0.0005344382952898741
Validation loss = 0.0007896550232544541
Validation loss = 0.0015052470844238997
Validation loss = 0.0009090198436751962
Validation loss = 0.0009427511831745505
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -10.6     |
| Iteration     | 53        |
| MaximumReturn | -0.000577 |
| MinimumReturn | -110      |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008081380510702729
Validation loss = 0.0005601809243671596
Validation loss = 0.0008911992190405726
Validation loss = 0.0007024187943898141
Validation loss = 0.0006593260332010686
Validation loss = 0.0007643833523616195
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007463969523087144
Validation loss = 0.0006498234579339623
Validation loss = 0.000605174747761339
Validation loss = 0.0007589696324430406
Validation loss = 0.0008936061058193445
Validation loss = 0.0010982922976836562
Validation loss = 0.000692671281285584
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007381049217656255
Validation loss = 0.0005354172899387777
Validation loss = 0.0005597626441158354
Validation loss = 0.001158050145022571
Validation loss = 0.0008438805816695094
Validation loss = 0.0010177860967814922
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002288969699293375
Validation loss = 0.002025504829362035
Validation loss = 0.0006390378694050014
Validation loss = 0.000643405131995678
Validation loss = 0.000813060556538403
Validation loss = 0.0006800701376050711
Validation loss = 0.0010859515750780702
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008494789944961667
Validation loss = 0.0005272883572615683
Validation loss = 0.0009593471768312156
Validation loss = 0.0012164165964350104
Validation loss = 0.001591350301168859
Validation loss = 0.0009619336342439055
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -8.47     |
| Iteration     | 54        |
| MaximumReturn | -0.000623 |
| MinimumReturn | -116      |
| TotalSamples  | 93296     |
-----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000905013584997505
Validation loss = 0.0011890993919223547
Validation loss = 0.0006391290226019919
Validation loss = 0.0008840268710628152
Validation loss = 0.0012291530147194862
Validation loss = 0.0008714023279026151
Validation loss = 0.00107772892806679
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009895024122670293
Validation loss = 0.0006777406670153141
Validation loss = 0.0007961681694723666
Validation loss = 0.000808234210126102
Validation loss = 0.0019089769339188933
Validation loss = 0.0010724776657298207
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001179391867481172
Validation loss = 0.0008201715536415577
Validation loss = 0.0008647143258713186
Validation loss = 0.0010720440186560154
Validation loss = 0.0006478190771304071
Validation loss = 0.0032144542783498764
Validation loss = 0.0011001573875546455
Validation loss = 0.0006344334688037634
Validation loss = 0.0020218973513692617
Validation loss = 0.00105242186691612
Validation loss = 0.0009828394977375865
Validation loss = 0.0010382678592577577
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011600059224292636
Validation loss = 0.0010972290765494108
Validation loss = 0.0009918451542034745
Validation loss = 0.0008174957474693656
Validation loss = 0.0007635570364072919
Validation loss = 0.0006231226143427193
Validation loss = 0.0009410280617885292
Validation loss = 0.001918541151098907
Validation loss = 0.000967576343100518
Validation loss = 0.0005644669872708619
Validation loss = 0.0008585932664573193
Validation loss = 0.0006377456593327224
Validation loss = 0.0014101145789027214
Validation loss = 0.0011102427961304784
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000770744401961565
Validation loss = 0.001697438070550561
Validation loss = 0.0033896921668201685
Validation loss = 0.0008127791224978864
Validation loss = 0.0006953570409677923
Validation loss = 0.0008472370100207627
Validation loss = 0.0005661812028847635
Validation loss = 0.0006681583472527564
Validation loss = 0.0008608806529082358
Validation loss = 0.0007871937123127282
Validation loss = 0.0006913364632055163
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -9.57     |
| Iteration     | 55        |
| MaximumReturn | -0.000635 |
| MinimumReturn | -106      |
| TotalSamples  | 94962     |
-----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006884793983772397
Validation loss = 0.000898548518307507
Validation loss = 0.0012292086612433195
Validation loss = 0.0012452169321477413
Validation loss = 0.000981967314146459
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007236512028612196
Validation loss = 0.002778527559712529
Validation loss = 0.0008188275387510657
Validation loss = 0.0006772939232178032
Validation loss = 0.0010326729388907552
Validation loss = 0.0013116981135681272
Validation loss = 0.002822853159159422
Validation loss = 0.0005904967547394335
Validation loss = 0.0006986725493334234
Validation loss = 0.0008237613365054131
Validation loss = 0.0015349213499575853
Validation loss = 0.0007060531643219292
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007124724797904491
Validation loss = 0.0006521863979287446
Validation loss = 0.0008490526815876365
Validation loss = 0.0007364466437138617
Validation loss = 0.0019565438851714134
Validation loss = 0.0007115029147826135
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000714110501576215
Validation loss = 0.0007927246042527258
Validation loss = 0.0022979965433478355
Validation loss = 0.0012373776407912374
Validation loss = 0.0007374997367151082
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000623955624178052
Validation loss = 0.0007014245493337512
Validation loss = 0.0008434195769950747
Validation loss = 0.0011516697704792023
Validation loss = 0.0008145335596054792
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.4    |
| Iteration     | 56       |
| MaximumReturn | -0.00056 |
| MinimumReturn | -112     |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010090566938742995
Validation loss = 0.0007761630113236606
Validation loss = 0.0014399656793102622
Validation loss = 0.0007493147277273238
Validation loss = 0.0009146493976004422
Validation loss = 0.0008893597987480462
Validation loss = 0.0006052001263014972
Validation loss = 0.0006436755647882819
Validation loss = 0.0016434568678960204
Validation loss = 0.0005761344800703228
Validation loss = 0.0005753121804445982
Validation loss = 0.0008102731662802398
Validation loss = 0.0006172198918648064
Validation loss = 0.0021485534962266684
Validation loss = 0.0006930843810550869
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006613580626435578
Validation loss = 0.0006839819834567606
Validation loss = 0.0017834472237154841
Validation loss = 0.0005094152293168008
Validation loss = 0.0010011820122599602
Validation loss = 0.000678496842738241
Validation loss = 0.0009322368423454463
Validation loss = 0.0006697847857140005
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005834568873979151
Validation loss = 0.0007334158290177584
Validation loss = 0.0009653273154981434
Validation loss = 0.0011637673014774919
Validation loss = 0.0008273015264421701
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006532257539220154
Validation loss = 0.0012592821149155498
Validation loss = 0.0015836401144042611
Validation loss = 0.0007258951663970947
Validation loss = 0.0007200166583061218
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000849870324600488
Validation loss = 0.000890990428160876
Validation loss = 0.0007821358158253133
Validation loss = 0.0010096547193825245
Validation loss = 0.000953430775552988
Validation loss = 0.0008648639195598662
Validation loss = 0.0005673744599334896
Validation loss = 0.0009602142381481826
Validation loss = 0.0015060851583257318
Validation loss = 0.0006919112056493759
Validation loss = 0.0015302192186936736
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -5.22     |
| Iteration     | 57        |
| MaximumReturn | -0.000468 |
| MinimumReturn | -92.4     |
| TotalSamples  | 98294     |
-----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006564465584233403
Validation loss = 0.0004863436333835125
Validation loss = 0.0007672767387703061
Validation loss = 0.0009583599166944623
Validation loss = 0.0008209641673602164
Validation loss = 0.00484367273747921
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007678454276174307
Validation loss = 0.000601245672442019
Validation loss = 0.0005717331659980118
Validation loss = 0.001158312545157969
Validation loss = 0.0009064846090041101
Validation loss = 0.0007534387987107038
Validation loss = 0.0006035785190761089
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009821937419474125
Validation loss = 0.0014407073613256216
Validation loss = 0.0011405200930312276
Validation loss = 0.000548836775124073
Validation loss = 0.0012628516415134072
Validation loss = 0.0006188483675941825
Validation loss = 0.001019268180243671
Validation loss = 0.0006760589312762022
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005995681276544929
Validation loss = 0.0010110546136274934
Validation loss = 0.0012445765314623713
Validation loss = 0.0006315952632576227
Validation loss = 0.0006529461243189871
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010964851826429367
Validation loss = 0.0007641618140041828
Validation loss = 0.0007772770477458835
Validation loss = 0.001477726735174656
Validation loss = 0.0008584274910390377
Validation loss = 0.0007916314643807709
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -30.8    |
| Iteration     | 58       |
| MaximumReturn | -0.00114 |
| MinimumReturn | -92.7    |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0010530035942792892
Validation loss = 0.0006450656801462173
Validation loss = 0.0013836320722475648
Validation loss = 0.0006870593060739338
Validation loss = 0.0007135661435313523
Validation loss = 0.0006869222270324826
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014535580994561315
Validation loss = 0.0006704338011331856
Validation loss = 0.0007311084773391485
Validation loss = 0.0007868127431720495
Validation loss = 0.0006204663659445941
Validation loss = 0.0006101930048316717
Validation loss = 0.0005411535385064781
Validation loss = 0.0006580259068869054
Validation loss = 0.001393432728946209
Validation loss = 0.0007784648332744837
Validation loss = 0.0008562406874261796
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011307141976431012
Validation loss = 0.000770328042563051
Validation loss = 0.0011698462767526507
Validation loss = 0.0008884056587703526
Validation loss = 0.0008256176952272654
Validation loss = 0.003182437038049102
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006247575511224568
Validation loss = 0.001009022700600326
Validation loss = 0.001342574367299676
Validation loss = 0.0006398013210855424
Validation loss = 0.0005116929532960057
Validation loss = 0.0007603939156979322
Validation loss = 0.0013099814532324672
Validation loss = 0.0006967313820496202
Validation loss = 0.0008739313343539834
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006867714109830558
Validation loss = 0.0008691801340319216
Validation loss = 0.0008144010789692402
Validation loss = 0.0022528180852532387
Validation loss = 0.0013700644485652447
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.204   |
| Iteration     | 59       |
| MaximumReturn | -0.126   |
| MinimumReturn | -0.32    |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005136777181178331
Validation loss = 0.0009567109518684447
Validation loss = 0.0005787099944427609
Validation loss = 0.0014891650062054396
Validation loss = 0.0005490639596246183
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00044793213601224124
Validation loss = 0.0006687862332910299
Validation loss = 0.0006847237818874419
Validation loss = 0.0007084895623847842
Validation loss = 0.0010156187927350402
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000925027416087687
Validation loss = 0.000672205991577357
Validation loss = 0.0006497135036624968
Validation loss = 0.0006811818457208574
Validation loss = 0.0005206080386415124
Validation loss = 0.0006848552729934454
Validation loss = 0.0007597875082865357
Validation loss = 0.0014720349572598934
Validation loss = 0.0008889071177691221
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008099116967059672
Validation loss = 0.0007934991735965014
Validation loss = 0.000803536269813776
Validation loss = 0.0004506315744947642
Validation loss = 0.0007507154950872064
Validation loss = 0.0009417877299711108
Validation loss = 0.0008383248932659626
Validation loss = 0.0005792161682620645
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010660291882231832
Validation loss = 0.0008718254393897951
Validation loss = 0.0009985187789425254
Validation loss = 0.0005568036576732993
Validation loss = 0.0014938035747036338
Validation loss = 0.0008034942438825965
Validation loss = 0.000650920148473233
Validation loss = 0.0011162605369463563
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0977  |
| Iteration     | 60       |
| MaximumReturn | -0.0526  |
| MinimumReturn | -0.165   |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006966008804738522
Validation loss = 0.0006082171457819641
Validation loss = 0.0008785854442976415
Validation loss = 0.0009115816210396588
Validation loss = 0.0005060194525867701
Validation loss = 0.0005539133562706411
Validation loss = 0.0004477027687244117
Validation loss = 0.0006909334915690124
Validation loss = 0.0006814189255237579
Validation loss = 0.0011148073244839907
Validation loss = 0.000647131004370749
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000512290804181248
Validation loss = 0.0007167280418798327
Validation loss = 0.0007516896585002542
Validation loss = 0.0005597599665634334
Validation loss = 0.0005231443792581558
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005413276958279312
Validation loss = 0.0007466984097845852
Validation loss = 0.0005065283621661365
Validation loss = 0.0006170379347167909
Validation loss = 0.0004472025902941823
Validation loss = 0.0014248640509322286
Validation loss = 0.0009759727981872857
Validation loss = 0.0005251787370070815
Validation loss = 0.000707585655618459
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006571409176103771
Validation loss = 0.0006031604134477675
Validation loss = 0.0006968430243432522
Validation loss = 0.000593906152062118
Validation loss = 0.0006580690387636423
Validation loss = 0.0006788764731027186
Validation loss = 0.0010562855750322342
Validation loss = 0.0010534841567277908
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008545328746549785
Validation loss = 0.0006845293682999909
Validation loss = 0.0006887891795486212
Validation loss = 0.0012907958589494228
Validation loss = 0.0006317886291071773
Validation loss = 0.0006330207688733935
Validation loss = 0.0006847796030342579
Validation loss = 0.0007150121964514256
Validation loss = 0.0007477728649973869
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0101  |
| Iteration     | 61       |
| MaximumReturn | -0.00073 |
| MinimumReturn | -0.114   |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007064748206175864
Validation loss = 0.0005313795409165323
Validation loss = 0.0008200535085052252
Validation loss = 0.0009262400562874973
Validation loss = 0.0005818715435452759
Validation loss = 0.0005887154839001596
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011129823978990316
Validation loss = 0.001354103907942772
Validation loss = 0.0007203839486464858
Validation loss = 0.0006563258939422667
Validation loss = 0.0010247216559946537
Validation loss = 0.0011418992653489113
Validation loss = 0.0006844509625807405
Validation loss = 0.0008113714284263551
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004949263529852033
Validation loss = 0.0006247779820114374
Validation loss = 0.0005448026349768043
Validation loss = 0.0006899353465996683
Validation loss = 0.0004696698160842061
Validation loss = 0.0012804883299395442
Validation loss = 0.00115393556188792
Validation loss = 0.0016416156431660056
Validation loss = 0.0004919638740830123
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008385672117583454
Validation loss = 0.0015288919676095247
Validation loss = 0.0006936202989891171
Validation loss = 0.0010555651970207691
Validation loss = 0.0007499298662878573
Validation loss = 0.0007840770413167775
Validation loss = 0.0010496772592887282
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006585278897546232
Validation loss = 0.000625110580585897
Validation loss = 0.0012235374888405204
Validation loss = 0.0006029355572536588
Validation loss = 0.0006507557700388134
Validation loss = 0.000645263644400984
Validation loss = 0.0011877786600962281
Validation loss = 0.0006870729266665876
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0138   |
| Iteration     | 62        |
| MaximumReturn | -0.000704 |
| MinimumReturn | -0.136    |
| TotalSamples  | 106624    |
-----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001074621919542551
Validation loss = 0.0007636537775397301
Validation loss = 0.0005159801803529263
Validation loss = 0.0009935550624504685
Validation loss = 0.0005857307114638388
Validation loss = 0.0007327074417844415
Validation loss = 0.0005327115068212152
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007576317293569446
Validation loss = 0.0008030224707908928
Validation loss = 0.000678730255458504
Validation loss = 0.0006121874321252108
Validation loss = 0.0006183819496072829
Validation loss = 0.0005229149828664958
Validation loss = 0.0006183276418596506
Validation loss = 0.0006767470040358603
Validation loss = 0.0010079840430989861
Validation loss = 0.0005229298840276897
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017662318423390388
Validation loss = 0.0018163612112402916
Validation loss = 0.0006186742684803903
Validation loss = 0.0007205044385045767
Validation loss = 0.0009363035787828267
Validation loss = 0.0011673789704218507
Validation loss = 0.0007500476785935462
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010242937132716179
Validation loss = 0.0007337587885558605
Validation loss = 0.0011960383271798491
Validation loss = 0.0008933168137446046
Validation loss = 0.0017715555150061846
Validation loss = 0.0010070023126900196
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005089170881547034
Validation loss = 0.0006049851072020829
Validation loss = 0.0019004333298653364
Validation loss = 0.0006647300906479359
Validation loss = 0.0006090994575060904
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0572   |
| Iteration     | 63        |
| MaximumReturn | -0.000606 |
| MinimumReturn | -1.33     |
| TotalSamples  | 108290    |
-----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001144637819379568
Validation loss = 0.0006376698729582131
Validation loss = 0.00043946405639871955
Validation loss = 0.0006606150418519974
Validation loss = 0.0013969667488709092
Validation loss = 0.0008700020262040198
Validation loss = 0.0008726667147129774
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007116062915883958
Validation loss = 0.001005745492875576
Validation loss = 0.000802374561317265
Validation loss = 0.000728112121578306
Validation loss = 0.0012216466711834073
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005333281005732715
Validation loss = 0.0008169062784872949
Validation loss = 0.000811662117484957
Validation loss = 0.0011301488848403096
Validation loss = 0.0005769261624664068
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00135516538284719
Validation loss = 0.0008245767676271498
Validation loss = 0.000474285741802305
Validation loss = 0.0012471856316551566
Validation loss = 0.0015008486807346344
Validation loss = 0.0004848548851441592
Validation loss = 0.00046336345258168876
Validation loss = 0.0008044585702009499
Validation loss = 0.0006409893976524472
Validation loss = 0.0009412627550773323
Validation loss = 0.0008147963089868426
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004945610417053103
Validation loss = 0.0010293241357430816
Validation loss = 0.0006202108925208449
Validation loss = 0.0009521979372948408
Validation loss = 0.0008748196414671838
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0155  |
| Iteration     | 64       |
| MaximumReturn | -0.00058 |
| MinimumReturn | -0.0871  |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005850574234500527
Validation loss = 0.0006639182684011757
Validation loss = 0.0016127063427120447
Validation loss = 0.000791907194070518
Validation loss = 0.001062049763277173
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027240354102104902
Validation loss = 0.0030116329435259104
Validation loss = 0.0006273738108575344
Validation loss = 0.0010130099253728986
Validation loss = 0.0008046103757806122
Validation loss = 0.0006224563694559038
Validation loss = 0.0007882038480602205
Validation loss = 0.0011527439346536994
Validation loss = 0.0007778405561111867
Validation loss = 0.0007620126125402749
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011537723476067185
Validation loss = 0.0006025217589922249
Validation loss = 0.0026573834475129843
Validation loss = 0.0007958625792525709
Validation loss = 0.0022456871811300516
Validation loss = 0.0006818404071964324
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007979574729688466
Validation loss = 0.0005930414772592485
Validation loss = 0.0011826958507299423
Validation loss = 0.001485840417444706
Validation loss = 0.0006792641361244023
Validation loss = 0.0009279652731493115
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007104583200998604
Validation loss = 0.0007986915879882872
Validation loss = 0.0009437631233595312
Validation loss = 0.0011444637784734368
Validation loss = 0.0007622882258147001
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.45     |
| Iteration     | 65        |
| MaximumReturn | -0.000671 |
| MinimumReturn | -61       |
| TotalSamples  | 111622    |
-----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0020091217011213303
Validation loss = 0.0013327659107744694
Validation loss = 0.0011096411617472768
Validation loss = 0.0007203322020359337
Validation loss = 0.0012947634095326066
Validation loss = 0.0014702860498800874
Validation loss = 0.0005127302720211446
Validation loss = 0.0007606309955008328
Validation loss = 0.0005302878562361002
Validation loss = 0.0013264872832223773
Validation loss = 0.0006164102233015001
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008297580643557012
Validation loss = 0.0010715113021433353
Validation loss = 0.0009254874312318861
Validation loss = 0.0006501997704617679
Validation loss = 0.0007969336002133787
Validation loss = 0.0006553764105774462
Validation loss = 0.001142017776146531
Validation loss = 0.0007802289328537881
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005129683995619416
Validation loss = 0.001417905674315989
Validation loss = 0.0012567871017381549
Validation loss = 0.001012504450045526
Validation loss = 0.0005874518537893891
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000637779594399035
Validation loss = 0.003078020643442869
Validation loss = 0.0007003014907240868
Validation loss = 0.0008620219887234271
Validation loss = 0.010442933067679405
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009464519680477679
Validation loss = 0.0005890274187549949
Validation loss = 0.0007746650953777134
Validation loss = 0.0008116735843941569
Validation loss = 0.0009160375338979065
Validation loss = 0.0013719223206862807
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.97    |
| Iteration     | 66       |
| MaximumReturn | -0.239   |
| MinimumReturn | -13.5    |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005844206316396594
Validation loss = 0.0004281271540094167
Validation loss = 0.0008217831491492689
Validation loss = 0.001034036511555314
Validation loss = 0.0006014257669448853
Validation loss = 0.0006748455343768001
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007859034813009202
Validation loss = 0.0006503424374386668
Validation loss = 0.0005802444065921009
Validation loss = 0.0017447349382564425
Validation loss = 0.0006285382551141083
Validation loss = 0.0005329218693077564
Validation loss = 0.000594437588006258
Validation loss = 0.0005964585579931736
Validation loss = 0.0013640770921483636
Validation loss = 0.0005677883746102452
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006393436342477798
Validation loss = 0.000998915289528668
Validation loss = 0.00197235238738358
Validation loss = 0.0006470078369602561
Validation loss = 0.0009783523855730891
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016234322683885694
Validation loss = 0.0008942759013734758
Validation loss = 0.000454546621767804
Validation loss = 0.0009237469057552516
Validation loss = 0.0007565362611785531
Validation loss = 0.0033947229385375977
Validation loss = 0.0005833362811245024
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012171814450994134
Validation loss = 0.0005843156832270324
Validation loss = 0.000862402725033462
Validation loss = 0.0009511540411040187
Validation loss = 0.0007530407165177166
Validation loss = 0.0006823165458627045
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0666   |
| Iteration     | 67        |
| MaximumReturn | -0.000769 |
| MinimumReturn | -0.229    |
| TotalSamples  | 114954    |
-----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008620527805760503
Validation loss = 0.0017909486778080463
Validation loss = 0.0007427563541568816
Validation loss = 0.0005814144387841225
Validation loss = 0.00093374791322276
Validation loss = 0.0010517156915739179
Validation loss = 0.0008157272823154926
Validation loss = 0.0007506791152991354
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009167736861854792
Validation loss = 0.007647394202649593
Validation loss = 0.0006559374742209911
Validation loss = 0.001916311215609312
Validation loss = 0.0006748205632902682
Validation loss = 0.0007059323252178729
Validation loss = 0.0012405830202624202
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009703200776129961
Validation loss = 0.0016594503540545702
Validation loss = 0.0006764967110939324
Validation loss = 0.0006745760329067707
Validation loss = 0.0011815991019830108
Validation loss = 0.0006729838787578046
Validation loss = 0.0009534315904602408
Validation loss = 0.0009035014663822949
Validation loss = 0.000786357675679028
Validation loss = 0.0005987356998957694
Validation loss = 0.0006533798296004534
Validation loss = 0.0005886125727556646
Validation loss = 0.001512379152700305
Validation loss = 0.0007095691980794072
Validation loss = 0.0006269140285439789
Validation loss = 0.0007539876387454569
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009191973367705941
Validation loss = 0.0009285425767302513
Validation loss = 0.0010047584073618054
Validation loss = 0.0008051074109971523
Validation loss = 0.0010396330617368221
Validation loss = 0.0008779196068644524
Validation loss = 0.0007880287012085319
Validation loss = 0.0014949371106922626
Validation loss = 0.0010405154898762703
Validation loss = 0.0005662424373440444
Validation loss = 0.0005073569482192397
Validation loss = 0.000916958088055253
Validation loss = 0.0007475310121662915
Validation loss = 0.0007173640187829733
Validation loss = 0.0014078665990382433
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007485368405468762
Validation loss = 0.0006493936525657773
Validation loss = 0.0023060436360538006
Validation loss = 0.0010323678143322468
Validation loss = 0.0008484006393700838
Validation loss = 0.001421585911884904
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00729  |
| Iteration     | 68        |
| MaximumReturn | -0.000592 |
| MinimumReturn | -0.164    |
| TotalSamples  | 116620    |
-----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0009430460049770772
Validation loss = 0.0009461370063945651
Validation loss = 0.0008693277486599982
Validation loss = 0.0007105569820851088
Validation loss = 0.0013735494576394558
Validation loss = 0.0008289435063488781
Validation loss = 0.0006383551517501473
Validation loss = 0.0010583903640508652
Validation loss = 0.0009129899553954601
Validation loss = 0.0007064248784445226
Validation loss = 0.001014860812574625
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010278720874339342
Validation loss = 0.001288951956667006
Validation loss = 0.0007655465160496533
Validation loss = 0.0009627790423110127
Validation loss = 0.0009571898262947798
Validation loss = 0.0009548363741487265
Validation loss = 0.0009450647630728781
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.004937662277370691
Validation loss = 0.0029164906591176987
Validation loss = 0.0007445774390362203
Validation loss = 0.0010700967395678163
Validation loss = 0.0007944491226226091
Validation loss = 0.0005684354691766202
Validation loss = 0.0009065999183803797
Validation loss = 0.0006172404973767698
Validation loss = 0.0007831029361113906
Validation loss = 0.0013340640580281615
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0015472166705876589
Validation loss = 0.0007689968915656209
Validation loss = 0.0009484770707786083
Validation loss = 0.0006434372044168413
Validation loss = 0.0018191167619079351
Validation loss = 0.0006725843413732946
Validation loss = 0.0007350601954385638
Validation loss = 0.0007083105156198144
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016579648945480585
Validation loss = 0.0006778898532502353
Validation loss = 0.0008139768033288419
Validation loss = 0.0012146621011197567
Validation loss = 0.0006147256353870034
Validation loss = 0.0005737858591601253
Validation loss = 0.0005834894836880267
Validation loss = 0.0006869425415061414
Validation loss = 0.0013558091595768929
Validation loss = 0.0008497502421960235
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.133    |
| Iteration     | 69        |
| MaximumReturn | -0.000818 |
| MinimumReturn | -2.73     |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005861542304046452
Validation loss = 0.0010486100800335407
Validation loss = 0.002061479026451707
Validation loss = 0.0006316988728940487
Validation loss = 0.0007250627968460321
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012820024276152253
Validation loss = 0.002260419772937894
Validation loss = 0.0006017406703904271
Validation loss = 0.0017459348309785128
Validation loss = 0.0008254801505245268
Validation loss = 0.0007955235196277499
Validation loss = 0.0008297811145894229
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022453460842370987
Validation loss = 0.0015903472667559981
Validation loss = 0.001229386660270393
Validation loss = 0.0006011476507410407
Validation loss = 0.0006732637411914766
Validation loss = 0.0009445947362110019
Validation loss = 0.0015345325227826834
Validation loss = 0.0010659509571269155
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005562191945500672
Validation loss = 0.0005624279729090631
Validation loss = 0.0007827647496014833
Validation loss = 0.0008099337574094534
Validation loss = 0.0010766348568722606
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007889763219282031
Validation loss = 0.0008232799591496587
Validation loss = 0.0008253000560216606
Validation loss = 0.0006048996583558619
Validation loss = 0.0010199886746704578
Validation loss = 0.0007218140526674688
Validation loss = 0.0012608208926394582
Validation loss = 0.0008056620135903358
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.254    |
| Iteration     | 70        |
| MaximumReturn | -0.000689 |
| MinimumReturn | -5.79     |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008539203554391861
Validation loss = 0.001799107762053609
Validation loss = 0.0011276435106992722
Validation loss = 0.0007169954478740692
Validation loss = 0.00088127312483266
Validation loss = 0.0009931775275617838
Validation loss = 0.0010941069340333343
Validation loss = 0.000591436808463186
Validation loss = 0.0024148032534867525
Validation loss = 0.0009122290066443384
Validation loss = 0.0009761312394402921
Validation loss = 0.0006689645233564079
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008005759445950389
Validation loss = 0.0007848347304388881
Validation loss = 0.000786657736171037
Validation loss = 0.0014867468271404505
Validation loss = 0.0007852179696783423
Validation loss = 0.0007789908559061587
Validation loss = 0.0006685752887278795
Validation loss = 0.0014147572219371796
Validation loss = 0.0007769250660203397
Validation loss = 0.0005701143527403474
Validation loss = 0.0010939178755506873
Validation loss = 0.0007355475099757314
Validation loss = 0.0006719103548675776
Validation loss = 0.0005175898550078273
Validation loss = 0.0013244983274489641
Validation loss = 0.0007911732536740601
Validation loss = 0.0009691059240140021
Validation loss = 0.000756056746467948
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010904090013355017
Validation loss = 0.00080962321953848
Validation loss = 0.0006614142330363393
Validation loss = 0.0013887342065572739
Validation loss = 0.0006856729742139578
Validation loss = 0.0010328919161111116
Validation loss = 0.0007211013580672443
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001230717869475484
Validation loss = 0.0004939454956911504
Validation loss = 0.0012269704602658749
Validation loss = 0.0006898960564285517
Validation loss = 0.0008152618538588285
Validation loss = 0.000814317143522203
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007613167399540544
Validation loss = 0.0006529506063088775
Validation loss = 0.0014423239044845104
Validation loss = 0.000673017289955169
Validation loss = 0.0006543048075400293
Validation loss = 0.0009629143169149756
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0265   |
| Iteration     | 71        |
| MaximumReturn | -0.000658 |
| MinimumReturn | -0.138    |
| TotalSamples  | 121618    |
-----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007871947600506246
Validation loss = 0.0005797669873572886
Validation loss = 0.0014081773115321994
Validation loss = 0.000856083061080426
Validation loss = 0.0011527740862220526
Validation loss = 0.0005541321006603539
Validation loss = 0.0006429932545870543
Validation loss = 0.0008979907142929733
Validation loss = 0.0006123812636360526
Validation loss = 0.0012063792673870921
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006572791608050466
Validation loss = 0.0009248183341696858
Validation loss = 0.0007917797775007784
Validation loss = 0.001203377963975072
Validation loss = 0.0009527214569970965
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0011292077833786607
Validation loss = 0.0008316970197483897
Validation loss = 0.0012900222791358829
Validation loss = 0.000833233236335218
Validation loss = 0.0008649455849081278
Validation loss = 0.0013679706025868654
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008050007163546979
Validation loss = 0.0006445505423471332
Validation loss = 0.0005359702627174556
Validation loss = 0.0005057235830463469
Validation loss = 0.001254400354810059
Validation loss = 0.0006382570136338472
Validation loss = 0.0008336066384799778
Validation loss = 0.0008354916935786605
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0006268952274695039
Validation loss = 0.0005443600821308792
Validation loss = 0.0008960720151662827
Validation loss = 0.0013769742799922824
Validation loss = 0.0007824702188372612
Validation loss = 0.002624951768666506
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.73     |
| Iteration     | 72        |
| MaximumReturn | -0.000648 |
| MinimumReturn | -35.8     |
| TotalSamples  | 123284    |
-----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000697940937243402
Validation loss = 0.0014931379118934274
Validation loss = 0.0008071293123066425
Validation loss = 0.000672236958052963
Validation loss = 0.0009749550954438746
Validation loss = 0.0007536027114838362
Validation loss = 0.0009640157804824412
Validation loss = 0.0006868193158879876
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0040125236846506596
Validation loss = 0.0010967586422339082
Validation loss = 0.0006950779934413731
Validation loss = 0.0006642651278525591
Validation loss = 0.0008902703411877155
Validation loss = 0.0006093370611779392
Validation loss = 0.0016740650171414018
Validation loss = 0.0010139241348952055
Validation loss = 0.0005959660629741848
Validation loss = 0.0008087728638201952
Validation loss = 0.000852526689413935
Validation loss = 0.0009874022798612714
Validation loss = 0.0006738629890605807
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007758077699691057
Validation loss = 0.0005907093873247504
Validation loss = 0.0006417303811758757
Validation loss = 0.002237367909401655
Validation loss = 0.0006662590312771499
Validation loss = 0.001193447969853878
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006293925689533353
Validation loss = 0.0008465749560855329
Validation loss = 0.0013592888135463
Validation loss = 0.0008256510482169688
Validation loss = 0.0006683237734250724
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012348059099167585
Validation loss = 0.0013504322851076722
Validation loss = 0.0013645007275044918
Validation loss = 0.000625806103926152
Validation loss = 0.000665709376335144
Validation loss = 0.0006834387895651162
Validation loss = 0.0006550332764163613
Validation loss = 0.0005642096512019634
Validation loss = 0.0009552377741783857
Validation loss = 0.0006107795052230358
Validation loss = 0.0007211330230347812
Validation loss = 0.0006268966244533658
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.02     |
| Iteration     | 73        |
| MaximumReturn | -0.000614 |
| MinimumReturn | -40       |
| TotalSamples  | 124950    |
-----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013291030190885067
Validation loss = 0.0008148607448674738
Validation loss = 0.0008846282144077122
Validation loss = 0.0007557921926490963
Validation loss = 0.001349926576949656
Validation loss = 0.0009442325681447983
Validation loss = 0.000723053643014282
Validation loss = 0.000679748656693846
Validation loss = 0.0007639122195541859
Validation loss = 0.0006986020598560572
Validation loss = 0.0009144170326180756
Validation loss = 0.0009851831709966063
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0009741841349750757
Validation loss = 0.0006401600548997521
Validation loss = 0.000599290186073631
Validation loss = 0.0007668199250474572
Validation loss = 0.0007384797791019082
Validation loss = 0.0010199208045378327
Validation loss = 0.0008335600141435862
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008626078488305211
Validation loss = 0.0007539240759797394
Validation loss = 0.001132059027440846
Validation loss = 0.0012142147170379758
Validation loss = 0.000708133855368942
Validation loss = 0.0011168270139023662
Validation loss = 0.0007028277032077312
Validation loss = 0.0016463013598695397
Validation loss = 0.00101608841214329
Validation loss = 0.0011261245235800743
Validation loss = 0.0007523406529799104
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008036283543333411
Validation loss = 0.0021681804209947586
Validation loss = 0.005258422344923019
Validation loss = 0.0005399155779741704
Validation loss = 0.0006347448215819895
Validation loss = 0.0014280980685725808
Validation loss = 0.0030167631339281797
Validation loss = 0.000971309607848525
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007722808513790369
Validation loss = 0.0006296434439718723
Validation loss = 0.001199778402224183
Validation loss = 0.0007319168071262538
Validation loss = 0.00138879066798836
Validation loss = 0.0008898644591681659
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3        |
| Iteration     | 74        |
| MaximumReturn | -0.000699 |
| MinimumReturn | -40.5     |
| TotalSamples  | 126616    |
-----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013603402767330408
Validation loss = 0.0009189851698465645
Validation loss = 0.0015447060577571392
Validation loss = 0.0019166681449860334
Validation loss = 0.000623218365944922
Validation loss = 0.0013550397707149386
Validation loss = 0.0016082803485915065
Validation loss = 0.0008254752028733492
Validation loss = 0.0006408868357539177
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006314161000773311
Validation loss = 0.0007605300634168088
Validation loss = 0.0011978315887972713
Validation loss = 0.0005773631855845451
Validation loss = 0.0008622645982541144
Validation loss = 0.0011037270305678248
Validation loss = 0.0008222528267651796
Validation loss = 0.0007447378011420369
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0013453583233058453
Validation loss = 0.0009448332712054253
Validation loss = 0.0009655779576860368
Validation loss = 0.0007576190400868654
Validation loss = 0.0006283804541453719
Validation loss = 0.0015588520327582955
Validation loss = 0.0009508999646641314
Validation loss = 0.0012036010157316923
Validation loss = 0.0008268944220617414
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005772790173068643
Validation loss = 0.0007642547716386616
Validation loss = 0.0008088148315437138
Validation loss = 0.0009147923556156456
Validation loss = 0.0007262052386067808
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007400420727208257
Validation loss = 0.001272725872695446
Validation loss = 0.0008617398561909795
Validation loss = 0.0007113668834790587
Validation loss = 0.001039000111632049
Validation loss = 0.0010481744538992643
Validation loss = 0.0011882619000971317
Validation loss = 0.000691970984917134
Validation loss = 0.0008377120248042047
Validation loss = 0.0008420568192377687
Validation loss = 0.0009676993358880281
Validation loss = 0.0012741709360852838
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.28     |
| Iteration     | 75        |
| MaximumReturn | -0.000736 |
| MinimumReturn | -20.8     |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.002387434709817171
Validation loss = 0.0009737831423990428
Validation loss = 0.000714867259375751
Validation loss = 0.0008483658311888576
Validation loss = 0.0010099823120981455
Validation loss = 0.0009618443436920643
Validation loss = 0.0009555208962410688
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0014639173168689013
Validation loss = 0.0009557752055115998
Validation loss = 0.001156320096924901
Validation loss = 0.0009769478347152472
Validation loss = 0.0007243999280035496
Validation loss = 0.0008528012549504638
Validation loss = 0.0010438826866447926
Validation loss = 0.0014075415674597025
Validation loss = 0.0007365947240032256
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006415569223463535
Validation loss = 0.0008824450196698308
Validation loss = 0.0007390258251689374
Validation loss = 0.0005737945903092623
Validation loss = 0.0005723483627662063
Validation loss = 0.0005783134256489575
Validation loss = 0.0007569213630631566
Validation loss = 0.0006757356459274888
Validation loss = 0.0009748192387633026
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008606929914094508
Validation loss = 0.0007052490836940706
Validation loss = 0.0007746042683720589
Validation loss = 0.0006614111480303109
Validation loss = 0.0004959965590387583
Validation loss = 0.0007578677032142878
Validation loss = 0.0008053444908000529
Validation loss = 0.0006595357554033399
Validation loss = 0.0008503893623128533
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0012405095621943474
Validation loss = 0.0006167386891320348
Validation loss = 0.0007649062899872661
Validation loss = 0.0009380422998219728
Validation loss = 0.0007339194999076426
Validation loss = 0.0006450057262554765
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -62      |
| Iteration     | 76       |
| MaximumReturn | -1.11    |
| MinimumReturn | -94      |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0013946775579825044
Validation loss = 0.0007347348728217185
Validation loss = 0.0006803041324019432
Validation loss = 0.0006799162947572768
Validation loss = 0.001024511526338756
Validation loss = 0.0006267280550673604
Validation loss = 0.0011910207103937864
Validation loss = 0.0008678847807459533
Validation loss = 0.0022478667087852955
Validation loss = 0.0012352579506114125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002509908052161336
Validation loss = 0.0007002177881076932
Validation loss = 0.0008711363770999014
Validation loss = 0.0007561423117294908
Validation loss = 0.0013960382202640176
Validation loss = 0.0007151555037125945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0016347297932952642
Validation loss = 0.0009280819213017821
Validation loss = 0.000547255331184715
Validation loss = 0.0005023836274631321
Validation loss = 0.0007131511811167002
Validation loss = 0.000962429097853601
Validation loss = 0.0009197738254442811
Validation loss = 0.0012882285518571734
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002376917749643326
Validation loss = 0.0008119971607811749
Validation loss = 0.0006298143998719752
Validation loss = 0.000547515694051981
Validation loss = 0.0007584126433357596
Validation loss = 0.000693736772518605
Validation loss = 0.0007486742688342929
Validation loss = 0.0007623155834153295
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0018238320481032133
Validation loss = 0.0006932862452231348
Validation loss = 0.0006967092049308121
Validation loss = 0.0005406958516687155
Validation loss = 0.0008630387019366026
Validation loss = 0.0016325024189427495
Validation loss = 0.0007157829822972417
Validation loss = 0.0008383585955016315
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -6.89     |
| Iteration     | 77        |
| MaximumReturn | -0.000731 |
| MinimumReturn | -90.4     |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0008458892116323113
Validation loss = 0.0012288581347092986
Validation loss = 0.0007507186383008957
Validation loss = 0.0008493512868881226
Validation loss = 0.0008294416475109756
Validation loss = 0.000660025340039283
Validation loss = 0.0008913655765354633
Validation loss = 0.0007313008536584675
Validation loss = 0.0005371490260586143
Validation loss = 0.0006888317293487489
Validation loss = 0.0007561278180219233
Validation loss = 0.0006959580932743847
Validation loss = 0.0007830391405150294
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006647247355431318
Validation loss = 0.0008895245264284313
Validation loss = 0.0005764696979895234
Validation loss = 0.0006720618112012744
Validation loss = 0.0009103539050556719
Validation loss = 0.0010016540763899684
Validation loss = 0.0007490183343179524
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008890948374755681
Validation loss = 0.0008579834247939289
Validation loss = 0.0010588833829388022
Validation loss = 0.002134892623871565
Validation loss = 0.0010072434088215232
Validation loss = 0.0006053581018932164
Validation loss = 0.0006526634679175913
Validation loss = 0.0010913369478657842
Validation loss = 0.0006701750098727643
Validation loss = 0.001206163433380425
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005569392815232277
Validation loss = 0.001443942659534514
Validation loss = 0.0009641706710681319
Validation loss = 0.0009465098846703768
Validation loss = 0.000996357761323452
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001226515625603497
Validation loss = 0.0008272818522527814
Validation loss = 0.0006071076495572925
Validation loss = 0.0032824401278048754
Validation loss = 0.002148758852854371
Validation loss = 0.0008964553126133978
Validation loss = 0.000656316289678216
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.29     |
| Iteration     | 78        |
| MaximumReturn | -0.000811 |
| MinimumReturn | -29       |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000737043796107173
Validation loss = 0.0005972690414637327
Validation loss = 0.0008444240083917975
Validation loss = 0.0005708365351893008
Validation loss = 0.0010564235271885991
Validation loss = 0.0005292444257065654
Validation loss = 0.0009276712080463767
Validation loss = 0.0006278237560763955
Validation loss = 0.0011357073672115803
Validation loss = 0.000652081798762083
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008939637918956578
Validation loss = 0.0008913152851164341
Validation loss = 0.0009614297887310386
Validation loss = 0.0009488819050602615
Validation loss = 0.0009037599666044116
Validation loss = 0.000728683837223798
Validation loss = 0.0008547807228751481
Validation loss = 0.0017113452777266502
Validation loss = 0.0007117271889001131
Validation loss = 0.0008385988185182214
Validation loss = 0.0007149021839722991
Validation loss = 0.0005851248279213905
Validation loss = 0.000710334861651063
Validation loss = 0.0006966405780985951
Validation loss = 0.0007527597481384873
Validation loss = 0.0025504380464553833
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009517936268821359
Validation loss = 0.0007137643406167626
Validation loss = 0.00044300578883849084
Validation loss = 0.00047967510181479156
Validation loss = 0.0019306280883029103
Validation loss = 0.0008946505258791149
Validation loss = 0.0006288599106483161
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010531714651733637
Validation loss = 0.0006302989204414189
Validation loss = 0.0012737498618662357
Validation loss = 0.0008427458233200014
Validation loss = 0.0006450021755881608
Validation loss = 0.00048369631986133754
Validation loss = 0.0009714073967188597
Validation loss = 0.0006014025420881808
Validation loss = 0.0006838228437118232
Validation loss = 0.0008402960957027972
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008400968508794904
Validation loss = 0.0008297959575429559
Validation loss = 0.0006129643297754228
Validation loss = 0.0006833598599769175
Validation loss = 0.000709808780811727
Validation loss = 0.0012141346232965589
Validation loss = 0.0007237179088406265
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0611   |
| Iteration     | 79        |
| MaximumReturn | -0.000635 |
| MinimumReturn | -1.43     |
| TotalSamples  | 134946    |
-----------------------------
