Logging to experiments/invertedPendulum/test-exp-dir/test-exp_seed2231
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7342149615287781
Validation loss = 0.37105947732925415
Validation loss = 0.3284894526004791
Validation loss = 0.28501662611961365
Validation loss = 0.26530057191848755
Validation loss = 0.2550528645515442
Validation loss = 0.24273721873760223
Validation loss = 0.2287118136882782
Validation loss = 0.21068258583545685
Validation loss = 0.21236148476600647
Validation loss = 0.1953432410955429
Validation loss = 0.19555385410785675
Validation loss = 0.20351962745189667
Validation loss = 0.1790979504585266
Validation loss = 0.18387272953987122
Validation loss = 0.17487187683582306
Validation loss = 0.21403717994689941
Validation loss = 0.18108534812927246
Validation loss = 0.18321262300014496
Validation loss = 0.16660632193088531
Validation loss = 0.1607959121465683
Validation loss = 0.15827129781246185
Validation loss = 0.15798990428447723
Validation loss = 0.16643208265304565
Validation loss = 0.16341041028499603
Validation loss = 0.16090361773967743
Validation loss = 0.16471432149410248
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7296577095985413
Validation loss = 0.35833850502967834
Validation loss = 0.3257156014442444
Validation loss = 0.2852441370487213
Validation loss = 0.2777444124221802
Validation loss = 0.26214027404785156
Validation loss = 0.26090651750564575
Validation loss = 0.2315889149904251
Validation loss = 0.2234160453081131
Validation loss = 0.20362716913223267
Validation loss = 0.19815799593925476
Validation loss = 0.18642449378967285
Validation loss = 0.18768882751464844
Validation loss = 0.18822018802165985
Validation loss = 0.16633474826812744
Validation loss = 0.1759604811668396
Validation loss = 0.1664304882287979
Validation loss = 0.17711974680423737
Validation loss = 0.16202901303768158
Validation loss = 0.1744423359632492
Validation loss = 0.1911962330341339
Validation loss = 0.20367714762687683
Validation loss = 0.17364725470542908
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7281891703605652
Validation loss = 0.3942612409591675
Validation loss = 0.3091355562210083
Validation loss = 0.295173704624176
Validation loss = 0.2731490433216095
Validation loss = 0.26230236887931824
Validation loss = 0.2480851113796234
Validation loss = 0.2240409553050995
Validation loss = 0.21045345067977905
Validation loss = 0.21286067366600037
Validation loss = 0.2103327065706253
Validation loss = 0.19960665702819824
Validation loss = 0.1761493682861328
Validation loss = 0.187539741396904
Validation loss = 0.1909438520669937
Validation loss = 0.21829670667648315
Validation loss = 0.1660841852426529
Validation loss = 0.16738709807395935
Validation loss = 0.1720530390739441
Validation loss = 0.17057490348815918
Validation loss = 0.15304189920425415
Validation loss = 0.16795091331005096
Validation loss = 0.16366520524024963
Validation loss = 0.14907966554164886
Validation loss = 0.15230348706245422
Validation loss = 0.13242825865745544
Validation loss = 0.14595471322536469
Validation loss = 0.14850851893424988
Validation loss = 0.14197899401187897
Validation loss = 0.13966625928878784
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7409963607788086
Validation loss = 0.39890316128730774
Validation loss = 0.3324911594390869
Validation loss = 0.30477747321128845
Validation loss = 0.27928951382637024
Validation loss = 0.2673308849334717
Validation loss = 0.25993382930755615
Validation loss = 0.23744578659534454
Validation loss = 0.22341306507587433
Validation loss = 0.2091343253850937
Validation loss = 0.2073960155248642
Validation loss = 0.18746715784072876
Validation loss = 0.19181901216506958
Validation loss = 0.18180929124355316
Validation loss = 0.1775248646736145
Validation loss = 0.17867335677146912
Validation loss = 0.17262329161167145
Validation loss = 0.17510642111301422
Validation loss = 0.16628216207027435
Validation loss = 0.15178035199642181
Validation loss = 0.18114487826824188
Validation loss = 0.15301041305065155
Validation loss = 0.16051563620567322
Validation loss = 0.14204041659832
Validation loss = 0.18140564858913422
Validation loss = 0.16859377920627594
Validation loss = 0.16389265656471252
Validation loss = 0.16580411791801453
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7263537645339966
Validation loss = 0.3432144820690155
Validation loss = 0.3285623788833618
Validation loss = 0.2825000584125519
Validation loss = 0.2797519564628601
Validation loss = 0.2617719769477844
Validation loss = 0.24086759984493256
Validation loss = 0.22304761409759521
Validation loss = 0.2239747792482376
Validation loss = 0.2148323357105255
Validation loss = 0.2591637372970581
Validation loss = 0.20602232217788696
Validation loss = 0.19157065451145172
Validation loss = 0.1898624747991562
Validation loss = 0.18148282170295715
Validation loss = 0.17840324342250824
Validation loss = 0.1908077597618103
Validation loss = 0.17355167865753174
Validation loss = 0.17575059831142426
Validation loss = 0.1666467785835266
Validation loss = 0.17109082639217377
Validation loss = 0.16784799098968506
Validation loss = 0.16614489257335663
Validation loss = 0.16238582134246826
Validation loss = 0.17285075783729553
Validation loss = 0.15968967974185944
Validation loss = 0.18119841814041138
Validation loss = 0.17524082958698273
Validation loss = 0.16427990794181824
Validation loss = 0.14777547121047974
Validation loss = 0.17436446249485016
Validation loss = 0.1475735455751419
Validation loss = 0.1596306562423706
Validation loss = 0.1536460667848587
Validation loss = 0.14475341141223907
Validation loss = 0.14284780621528625
Validation loss = 0.1424044370651245
Validation loss = 0.14080911874771118
Validation loss = 0.1412656307220459
Validation loss = 0.15292271971702576
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.136   |
| Iteration     | 0        |
| MaximumReturn | -0.0348  |
| MinimumReturn | -1.25    |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.26042211055755615
Validation loss = 0.13752304017543793
Validation loss = 0.11879512667655945
Validation loss = 0.11580409109592438
Validation loss = 0.13068503141403198
Validation loss = 0.1109352856874466
Validation loss = 0.1144125908613205
Validation loss = 0.10728389024734497
Validation loss = 0.1022324338555336
Validation loss = 0.10236205905675888
Validation loss = 0.09775444865226746
Validation loss = 0.0895530954003334
Validation loss = 0.10144093632698059
Validation loss = 0.10544092208147049
Validation loss = 0.09479786455631256
Validation loss = 0.08839535713195801
Validation loss = 0.08365213871002197
Validation loss = 0.08488765358924866
Validation loss = 0.08060967922210693
Validation loss = 0.07747463136911392
Validation loss = 0.07856187224388123
Validation loss = 0.07707107067108154
Validation loss = 0.10702896863222122
Validation loss = 0.07801111042499542
Validation loss = 0.08201818913221359
Validation loss = 0.09165817499160767
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.21984301507472992
Validation loss = 0.1288384646177292
Validation loss = 0.12039121985435486
Validation loss = 0.11845790594816208
Validation loss = 0.1028585210442543
Validation loss = 0.1081787571310997
Validation loss = 0.12132232636213303
Validation loss = 0.10136309266090393
Validation loss = 0.10069330781698227
Validation loss = 0.10190801322460175
Validation loss = 0.11248185485601425
Validation loss = 0.09906672686338425
Validation loss = 0.09259463101625443
Validation loss = 0.0921756699681282
Validation loss = 0.09356671571731567
Validation loss = 0.08903869986534119
Validation loss = 0.09004580974578857
Validation loss = 0.08996930718421936
Validation loss = 0.08548054844141006
Validation loss = 0.07818306237459183
Validation loss = 0.08044417202472687
Validation loss = 0.07618536800146103
Validation loss = 0.07713573426008224
Validation loss = 0.08827979117631912
Validation loss = 0.08541212975978851
Validation loss = 0.09438512474298477
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.23748087882995605
Validation loss = 0.13639657199382782
Validation loss = 0.10931616276502609
Validation loss = 0.10834828019142151
Validation loss = 0.10132738947868347
Validation loss = 0.10202883929014206
Validation loss = 0.09887780249118805
Validation loss = 0.09070531278848648
Validation loss = 0.09158952534198761
Validation loss = 0.09341995418071747
Validation loss = 0.08793998509645462
Validation loss = 0.09157472103834152
Validation loss = 0.09216395020484924
Validation loss = 0.09673409163951874
Validation loss = 0.08720632642507553
Validation loss = 0.08233616501092911
Validation loss = 0.08084109425544739
Validation loss = 0.0741962417960167
Validation loss = 0.08857513964176178
Validation loss = 0.07608712464570999
Validation loss = 0.07809021323919296
Validation loss = 0.0809420645236969
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2679087817668915
Validation loss = 0.14321273565292358
Validation loss = 0.11487782746553421
Validation loss = 0.11199425905942917
Validation loss = 0.10660935938358307
Validation loss = 0.10243570059537888
Validation loss = 0.09932245314121246
Validation loss = 0.1046023964881897
Validation loss = 0.09004616737365723
Validation loss = 0.0943683534860611
Validation loss = 0.08840430527925491
Validation loss = 0.08982507139444351
Validation loss = 0.09209784120321274
Validation loss = 0.08476301282644272
Validation loss = 0.09249700605869293
Validation loss = 0.09112785756587982
Validation loss = 0.0931282564997673
Validation loss = 0.08729813247919083
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2956393361091614
Validation loss = 0.149021714925766
Validation loss = 0.13004326820373535
Validation loss = 0.12308373302221298
Validation loss = 0.12049675732851028
Validation loss = 0.11096518486738205
Validation loss = 0.10848081856966019
Validation loss = 0.10755110532045364
Validation loss = 0.09707293659448624
Validation loss = 0.0969381034374237
Validation loss = 0.09535975754261017
Validation loss = 0.09232864528894424
Validation loss = 0.09126055985689163
Validation loss = 0.09323322772979736
Validation loss = 0.09224656224250793
Validation loss = 0.09073668718338013
Validation loss = 0.09203553199768066
Validation loss = 0.08001361042261124
Validation loss = 0.08095886558294296
Validation loss = 0.08784576505422592
Validation loss = 0.07696565240621567
Validation loss = 0.08671286702156067
Validation loss = 0.07433358579874039
Validation loss = 0.07616367936134338
Validation loss = 0.07262717187404633
Validation loss = 0.06600175052881241
Validation loss = 0.07498592883348465
Validation loss = 0.07698681950569153
Validation loss = 0.0735180601477623
Validation loss = 0.08271430432796478
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0033  |
| Iteration     | 1        |
| MaximumReturn | -0.00263 |
| MinimumReturn | -0.00472 |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07071585208177567
Validation loss = 0.05353913828730583
Validation loss = 0.05594578757882118
Validation loss = 0.05479278787970543
Validation loss = 0.05043075978755951
Validation loss = 0.0479532890021801
Validation loss = 0.04570678621530533
Validation loss = 0.04129372164607048
Validation loss = 0.05112265422940254
Validation loss = 0.05296292528510094
Validation loss = 0.05414539948105812
Validation loss = 0.040513597428798676
Validation loss = 0.04304198548197746
Validation loss = 0.05338751897215843
Validation loss = 0.04729747399687767
Validation loss = 0.046962473541498184
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08526346832513809
Validation loss = 0.06073375046253204
Validation loss = 0.05759312957525253
Validation loss = 0.057451020926237106
Validation loss = 0.04910830408334732
Validation loss = 0.06882409751415253
Validation loss = 0.052648190408945084
Validation loss = 0.04885634779930115
Validation loss = 0.04313250631093979
Validation loss = 0.0492839515209198
Validation loss = 0.04847901687026024
Validation loss = 0.044526297599077225
Validation loss = 0.05808386951684952
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08740010112524033
Validation loss = 0.06935807317495346
Validation loss = 0.05687376484274864
Validation loss = 0.053300704807043076
Validation loss = 0.07071555405855179
Validation loss = 0.04509207233786583
Validation loss = 0.05073292925953865
Validation loss = 0.04815048351883888
Validation loss = 0.046495258808135986
Validation loss = 0.04699351638555527
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08659628033638
Validation loss = 0.05861179903149605
Validation loss = 0.0559685193002224
Validation loss = 0.05192117020487785
Validation loss = 0.05539830029010773
Validation loss = 0.06282363086938858
Validation loss = 0.05212811008095741
Validation loss = 0.051602087914943695
Validation loss = 0.05251426622271538
Validation loss = 0.052947793155908585
Validation loss = 0.05578089505434036
Validation loss = 0.0504271537065506
Validation loss = 0.04331018403172493
Validation loss = 0.05067000165581703
Validation loss = 0.05115289241075516
Validation loss = 0.04813817888498306
Validation loss = 0.054674502462148666
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11291629076004028
Validation loss = 0.07786568999290466
Validation loss = 0.06530313938856125
Validation loss = 0.0614437609910965
Validation loss = 0.05917967110872269
Validation loss = 0.049897875636816025
Validation loss = 0.05466332286596298
Validation loss = 0.05197786167263985
Validation loss = 0.04610519856214523
Validation loss = 0.048126354813575745
Validation loss = 0.0561685785651207
Validation loss = 0.05129169672727585
Validation loss = 0.04553681239485741
Validation loss = 0.0466696135699749
Validation loss = 0.04127867519855499
Validation loss = 0.04115447774529457
Validation loss = 0.04494217783212662
Validation loss = 0.03791983798146248
Validation loss = 0.04194563627243042
Validation loss = 0.04267839714884758
Validation loss = 0.0368163101375103
Validation loss = 0.03915997967123985
Validation loss = 0.04033597558736801
Validation loss = 0.03716089949011803
Validation loss = 0.03639247640967369
Validation loss = 0.04575084149837494
Validation loss = 0.03582455962896347
Validation loss = 0.0409160740673542
Validation loss = 0.04273194074630737
Validation loss = 0.040111735463142395
Validation loss = 0.05304985120892525
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000751 |
| Iteration     | 2         |
| MaximumReturn | -0.000568 |
| MinimumReturn | -0.00105  |
| TotalSamples  | 6664      |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04412241280078888
Validation loss = 0.06275004893541336
Validation loss = 0.0370476059615612
Validation loss = 0.04189566150307655
Validation loss = 0.05099276080727577
Validation loss = 0.0382542759180069
Validation loss = 0.03862375393509865
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05152624472975731
Validation loss = 0.04941273108124733
Validation loss = 0.055152539163827896
Validation loss = 0.04472668841481209
Validation loss = 0.03776629641652107
Validation loss = 0.03792118653655052
Validation loss = 0.044947024434804916
Validation loss = 0.03526226431131363
Validation loss = 0.044598087668418884
Validation loss = 0.041805852204561234
Validation loss = 0.04875917732715607
Validation loss = 0.04101209342479706
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.050063442438840866
Validation loss = 0.047059740871191025
Validation loss = 0.05605224892497063
Validation loss = 0.04826914891600609
Validation loss = 0.03348772972822189
Validation loss = 0.04283267259597778
Validation loss = 0.03845786675810814
Validation loss = 0.04252080246806145
Validation loss = 0.03328699991106987
Validation loss = 0.03514120727777481
Validation loss = 0.036249175667762756
Validation loss = 0.04247068241238594
Validation loss = 0.0357820950448513
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05563776195049286
Validation loss = 0.04737077280879021
Validation loss = 0.04935717582702637
Validation loss = 0.03767075389623642
Validation loss = 0.048426445573568344
Validation loss = 0.04873540997505188
Validation loss = 0.04100263491272926
Validation loss = 0.04210411012172699
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04088346287608147
Validation loss = 0.04192757233977318
Validation loss = 0.040830355137586594
Validation loss = 0.03912230581045151
Validation loss = 0.04185950756072998
Validation loss = 0.03655983880162239
Validation loss = 0.03412569314241409
Validation loss = 0.03134273737668991
Validation loss = 0.040870390832424164
Validation loss = 0.031348343938589096
Validation loss = 0.029878566041588783
Validation loss = 0.03512197360396385
Validation loss = 0.036198459565639496
Validation loss = 0.03335238993167877
Validation loss = 0.02841857075691223
Validation loss = 0.03591858968138695
Validation loss = 0.036016352474689484
Validation loss = 0.029853425920009613
Validation loss = 0.02983759343624115
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000802 |
| Iteration     | 3         |
| MaximumReturn | -0.000667 |
| MinimumReturn | -0.00117  |
| TotalSamples  | 8330      |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04282984137535095
Validation loss = 0.03602613881230354
Validation loss = 0.03772669658064842
Validation loss = 0.036168355494737625
Validation loss = 0.03205889090895653
Validation loss = 0.03398255258798599
Validation loss = 0.037688806653022766
Validation loss = 0.03756427764892578
Validation loss = 0.03547704964876175
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03543488308787346
Validation loss = 0.035048626363277435
Validation loss = 0.04260432720184326
Validation loss = 0.03695635870099068
Validation loss = 0.039528194814920425
Validation loss = 0.037678513675928116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03499956056475639
Validation loss = 0.04105355963110924
Validation loss = 0.041770901530981064
Validation loss = 0.04189801588654518
Validation loss = 0.04082037881016731
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.045211777091026306
Validation loss = 0.04085719585418701
Validation loss = 0.040549762547016144
Validation loss = 0.03706358000636101
Validation loss = 0.03761446475982666
Validation loss = 0.036290064454078674
Validation loss = 0.03591456264257431
Validation loss = 0.03243867680430412
Validation loss = 0.035533010959625244
Validation loss = 0.029517317190766335
Validation loss = 0.03869170323014259
Validation loss = 0.04146236181259155
Validation loss = 0.03917224332690239
Validation loss = 0.043848972767591476
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.043888334184885025
Validation loss = 0.029773667454719543
Validation loss = 0.02422400936484337
Validation loss = 0.03314300253987312
Validation loss = 0.03029664233326912
Validation loss = 0.03079313598573208
Validation loss = 0.033985208719968796
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000796 |
| Iteration     | 4         |
| MaximumReturn | -0.000545 |
| MinimumReturn | -0.000959 |
| TotalSamples  | 9996      |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03705931082367897
Validation loss = 0.0282670296728611
Validation loss = 0.032710231840610504
Validation loss = 0.032746389508247375
Validation loss = 0.03516212850809097
Validation loss = 0.03258921951055527
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.027691170573234558
Validation loss = 0.03178287670016289
Validation loss = 0.03364256024360657
Validation loss = 0.03495744615793228
Validation loss = 0.03752218931913376
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03937467560172081
Validation loss = 0.03705713897943497
Validation loss = 0.030426567420363426
Validation loss = 0.028379743918776512
Validation loss = 0.032767441123723984
Validation loss = 0.026815373450517654
Validation loss = 0.0342857800424099
Validation loss = 0.029604798182845116
Validation loss = 0.03321428969502449
Validation loss = 0.029799258336424828
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.035420048981904984
Validation loss = 0.031542710959911346
Validation loss = 0.03123503364622593
Validation loss = 0.030644234269857407
Validation loss = 0.033069904893636703
Validation loss = 0.034616392105817795
Validation loss = 0.028699437156319618
Validation loss = 0.03347376361489296
Validation loss = 0.029935788363218307
Validation loss = 0.04468401521444321
Validation loss = 0.028959568589925766
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.031224045902490616
Validation loss = 0.030507612973451614
Validation loss = 0.033571433275938034
Validation loss = 0.026219438761472702
Validation loss = 0.037127625197172165
Validation loss = 0.02525787614285946
Validation loss = 0.032357390969991684
Validation loss = 0.0291384719312191
Validation loss = 0.02418552339076996
Validation loss = 0.02906441129744053
Validation loss = 0.02454475685954094
Validation loss = 0.025730956345796585
Validation loss = 0.026552055031061172
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000775 |
| Iteration     | 5         |
| MaximumReturn | -0.000532 |
| MinimumReturn | -0.00104  |
| TotalSamples  | 11662     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.037070706486701965
Validation loss = 0.033407554030418396
Validation loss = 0.03533902391791344
Validation loss = 0.03340868279337883
Validation loss = 0.03495769202709198
Validation loss = 0.03241824731230736
Validation loss = 0.03331746533513069
Validation loss = 0.030313437804579735
Validation loss = 0.032473694533109665
Validation loss = 0.03229431062936783
Validation loss = 0.0282176174223423
Validation loss = 0.03583201766014099
Validation loss = 0.03726261854171753
Validation loss = 0.026262987405061722
Validation loss = 0.03110680915415287
Validation loss = 0.032144416123628616
Validation loss = 0.02577732875943184
Validation loss = 0.02740158513188362
Validation loss = 0.026967257261276245
Validation loss = 0.027964001521468163
Validation loss = 0.03208030015230179
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03930702805519104
Validation loss = 0.03240625932812691
Validation loss = 0.03239336982369423
Validation loss = 0.03226418420672417
Validation loss = 0.0338490754365921
Validation loss = 0.02938065491616726
Validation loss = 0.030470341444015503
Validation loss = 0.035602081567049026
Validation loss = 0.03282344341278076
Validation loss = 0.026484068483114243
Validation loss = 0.04708521440625191
Validation loss = 0.03226379305124283
Validation loss = 0.03229624032974243
Validation loss = 0.04247560352087021
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.051505010575056076
Validation loss = 0.04295126348733902
Validation loss = 0.03548663482069969
Validation loss = 0.04332609102129936
Validation loss = 0.03167448565363884
Validation loss = 0.030184272676706314
Validation loss = 0.03201194852590561
Validation loss = 0.02802983857691288
Validation loss = 0.03040793538093567
Validation loss = 0.035090334713459015
Validation loss = 0.02542063221335411
Validation loss = 0.030303457751870155
Validation loss = 0.027676129713654518
Validation loss = 0.02524726092815399
Validation loss = 0.03339029848575592
Validation loss = 0.03723787143826485
Validation loss = 0.029038425534963608
Validation loss = 0.02971264347434044
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04386439174413681
Validation loss = 0.03934275358915329
Validation loss = 0.03260239213705063
Validation loss = 0.028339717537164688
Validation loss = 0.034771788865327835
Validation loss = 0.030031198635697365
Validation loss = 0.0393858402967453
Validation loss = 0.02986481413245201
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03235428407788277
Validation loss = 0.047803930938243866
Validation loss = 0.0297059528529644
Validation loss = 0.029233241453766823
Validation loss = 0.025155162438750267
Validation loss = 0.02706979215145111
Validation loss = 0.0333404578268528
Validation loss = 0.023955127224326134
Validation loss = 0.02608581818640232
Validation loss = 0.02514725551009178
Validation loss = 0.02782473899424076
Validation loss = 0.02718515135347843
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000732 |
| Iteration     | 6         |
| MaximumReturn | -0.000606 |
| MinimumReturn | -0.000948 |
| TotalSamples  | 13328     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02656463533639908
Validation loss = 0.02607179991900921
Validation loss = 0.030154312029480934
Validation loss = 0.027506330981850624
Validation loss = 0.03333408758044243
Validation loss = 0.02836756594479084
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03423944115638733
Validation loss = 0.026763305068016052
Validation loss = 0.03284018859267235
Validation loss = 0.029547831043601036
Validation loss = 0.03258748725056648
Validation loss = 0.030492836609482765
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02403489500284195
Validation loss = 0.02468353696167469
Validation loss = 0.026964006945490837
Validation loss = 0.02813963033258915
Validation loss = 0.033774588257074356
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02544829435646534
Validation loss = 0.03129313141107559
Validation loss = 0.02739744447171688
Validation loss = 0.0286394115537405
Validation loss = 0.0338076576590538
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03087124414741993
Validation loss = 0.030929652974009514
Validation loss = 0.028850628063082695
Validation loss = 0.03007938154041767
Validation loss = 0.021674811840057373
Validation loss = 0.02648475021123886
Validation loss = 0.02070636674761772
Validation loss = 0.023906735703349113
Validation loss = 0.023417172953486443
Validation loss = 0.02435162477195263
Validation loss = 0.025329137220978737
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000953 |
| Iteration     | 7         |
| MaximumReturn | -0.000496 |
| MinimumReturn | -0.00193  |
| TotalSamples  | 14994     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0271765124052763
Validation loss = 0.01874699257314205
Validation loss = 0.017910515889525414
Validation loss = 0.020298970863223076
Validation loss = 0.01650095358490944
Validation loss = 0.018275320529937744
Validation loss = 0.015628892928361893
Validation loss = 0.021613169461488724
Validation loss = 0.020842796191573143
Validation loss = 0.0169046763330698
Validation loss = 0.01719038560986519
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.034028131514787674
Validation loss = 0.018901841714978218
Validation loss = 0.0203534085303545
Validation loss = 0.029789907857775688
Validation loss = 0.017434101551771164
Validation loss = 0.025340799242258072
Validation loss = 0.022792953997850418
Validation loss = 0.021708888933062553
Validation loss = 0.01969236135482788
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021884065121412277
Validation loss = 0.022075330838561058
Validation loss = 0.017381969839334488
Validation loss = 0.018203942105174065
Validation loss = 0.025236783549189568
Validation loss = 0.019276371225714684
Validation loss = 0.018898827955126762
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024663755670189857
Validation loss = 0.02324102818965912
Validation loss = 0.01932893507182598
Validation loss = 0.020101487636566162
Validation loss = 0.019351283088326454
Validation loss = 0.017553796991705894
Validation loss = 0.016849370673298836
Validation loss = 0.025938568636775017
Validation loss = 0.018324101343750954
Validation loss = 0.01857677288353443
Validation loss = 0.019636468961834908
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026342319324612617
Validation loss = 0.022280070930719376
Validation loss = 0.016197357326745987
Validation loss = 0.016673702746629715
Validation loss = 0.0256185345351696
Validation loss = 0.016986852511763573
Validation loss = 0.020946724340319633
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000847 |
| Iteration     | 8         |
| MaximumReturn | -0.000571 |
| MinimumReturn | -0.00141  |
| TotalSamples  | 16660     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022734882310032845
Validation loss = 0.015531104989349842
Validation loss = 0.012844067998230457
Validation loss = 0.0122471172362566
Validation loss = 0.014144526794552803
Validation loss = 0.014616725035011768
Validation loss = 0.016384761780500412
Validation loss = 0.014228122308850288
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023704297840595245
Validation loss = 0.017398344352841377
Validation loss = 0.016869613900780678
Validation loss = 0.01478876918554306
Validation loss = 0.018430132418870926
Validation loss = 0.013277617283165455
Validation loss = 0.015122124925255775
Validation loss = 0.01354922167956829
Validation loss = 0.015787020325660706
Validation loss = 0.017827365547418594
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01784886047244072
Validation loss = 0.014538595452904701
Validation loss = 0.015367378480732441
Validation loss = 0.015878479927778244
Validation loss = 0.013593989424407482
Validation loss = 0.017169032245874405
Validation loss = 0.014583910815417767
Validation loss = 0.014864715747535229
Validation loss = 0.014969437383115292
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01723814569413662
Validation loss = 0.01633523404598236
Validation loss = 0.014066376723349094
Validation loss = 0.016098488122224808
Validation loss = 0.013663636520504951
Validation loss = 0.0140007846057415
Validation loss = 0.013823248445987701
Validation loss = 0.01379072479903698
Validation loss = 0.01584763079881668
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020295731723308563
Validation loss = 0.014400451444089413
Validation loss = 0.01398919802159071
Validation loss = 0.012806675396859646
Validation loss = 0.01506129652261734
Validation loss = 0.013397812843322754
Validation loss = 0.013847030699253082
Validation loss = 0.017582910135388374
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000935 |
| Iteration     | 9         |
| MaximumReturn | -0.000558 |
| MinimumReturn | -0.002    |
| TotalSamples  | 18326     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.015656234696507454
Validation loss = 0.015107009559869766
Validation loss = 0.011091139167547226
Validation loss = 0.015521343797445297
Validation loss = 0.013929064385592937
Validation loss = 0.018273353576660156
Validation loss = 0.014919989742338657
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025363270193338394
Validation loss = 0.017856216058135033
Validation loss = 0.014204048551619053
Validation loss = 0.012880455702543259
Validation loss = 0.015255576930940151
Validation loss = 0.012457791715860367
Validation loss = 0.014673239551484585
Validation loss = 0.019649669528007507
Validation loss = 0.012760916724801064
Validation loss = 0.01353330910205841
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01085695717483759
Validation loss = 0.013195394538342953
Validation loss = 0.013614144176244736
Validation loss = 0.013941341079771519
Validation loss = 0.011012055911123753
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014609245583415031
Validation loss = 0.017207540571689606
Validation loss = 0.013208604417741299
Validation loss = 0.015930475667119026
Validation loss = 0.019980885088443756
Validation loss = 0.014229727908968925
Validation loss = 0.01779111661016941
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0168134942650795
Validation loss = 0.014493238180875778
Validation loss = 0.011227449402213097
Validation loss = 0.011609400622546673
Validation loss = 0.011833492666482925
Validation loss = 0.015431789681315422
Validation loss = 0.009857140481472015
Validation loss = 0.016678664833307266
Validation loss = 0.011695310473442078
Validation loss = 0.013948481529951096
Validation loss = 0.012678753584623337
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00086  |
| Iteration     | 10        |
| MaximumReturn | -0.000555 |
| MinimumReturn | -0.00179  |
| TotalSamples  | 19992     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013708742335438728
Validation loss = 0.011683621443808079
Validation loss = 0.011064089834690094
Validation loss = 0.012676442973315716
Validation loss = 0.011527830734848976
Validation loss = 0.014387439005076885
Validation loss = 0.012956744059920311
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015098674222826958
Validation loss = 0.011584566906094551
Validation loss = 0.01296053547412157
Validation loss = 0.010804936289787292
Validation loss = 0.01610857993364334
Validation loss = 0.011111997067928314
Validation loss = 0.015491878613829613
Validation loss = 0.012309368699789047
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012116274796426296
Validation loss = 0.011441385373473167
Validation loss = 0.011674130335450172
Validation loss = 0.01664508692920208
Validation loss = 0.011930650100111961
Validation loss = 0.011596078053116798
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013451643288135529
Validation loss = 0.00926220789551735
Validation loss = 0.012953853234648705
Validation loss = 0.012621807865798473
Validation loss = 0.012445757165551186
Validation loss = 0.011573845520615578
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013477044180035591
Validation loss = 0.009250789880752563
Validation loss = 0.011830571107566357
Validation loss = 0.01226786244660616
Validation loss = 0.011229714378714561
Validation loss = 0.014251789078116417
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.000939 |
| Iteration     | 11        |
| MaximumReturn | -0.000573 |
| MinimumReturn | -0.00161  |
| TotalSamples  | 21658     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014318013563752174
Validation loss = 0.011318466626107693
Validation loss = 0.010670862160623074
Validation loss = 0.009310523048043251
Validation loss = 0.013475148007273674
Validation loss = 0.011223075911402702
Validation loss = 0.009141980670392513
Validation loss = 0.012001853436231613
Validation loss = 0.011953641660511494
Validation loss = 0.016398195177316666
Validation loss = 0.01303906925022602
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018865367397665977
Validation loss = 0.01083376444876194
Validation loss = 0.011123895645141602
Validation loss = 0.012553190812468529
Validation loss = 0.013170955702662468
Validation loss = 0.011416321620345116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.031412411481142044
Validation loss = 0.015770627185702324
Validation loss = 0.013712960295379162
Validation loss = 0.012723773717880249
Validation loss = 0.01303381659090519
Validation loss = 0.011596712283790112
Validation loss = 0.012650050222873688
Validation loss = 0.011149042285978794
Validation loss = 0.010260756127536297
Validation loss = 0.010451429523527622
Validation loss = 0.015199770219624043
Validation loss = 0.015416678972542286
Validation loss = 0.020040780305862427
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011574173346161842
Validation loss = 0.014214055612683296
Validation loss = 0.010814288631081581
Validation loss = 0.011071385815739632
Validation loss = 0.011832648888230324
Validation loss = 0.011989591643214226
Validation loss = 0.01222663838416338
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01258521992713213
Validation loss = 0.013276596553623676
Validation loss = 0.00950124766677618
Validation loss = 0.010061088018119335
Validation loss = 0.009910869412124157
Validation loss = 0.014228874817490578
Validation loss = 0.01104009710252285
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00103  |
| Iteration     | 12        |
| MaximumReturn | -0.000518 |
| MinimumReturn | -0.00207  |
| TotalSamples  | 23324     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010201776400208473
Validation loss = 0.008856677450239658
Validation loss = 0.008905407041311264
Validation loss = 0.009976718574762344
Validation loss = 0.009240983054041862
Validation loss = 0.009293686598539352
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010932966135442257
Validation loss = 0.009866255335509777
Validation loss = 0.010083680972456932
Validation loss = 0.01392698660492897
Validation loss = 0.012674177065491676
Validation loss = 0.012279992923140526
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016832910478115082
Validation loss = 0.011525124311447144
Validation loss = 0.014769017696380615
Validation loss = 0.010516596958041191
Validation loss = 0.01142637524753809
Validation loss = 0.009222287684679031
Validation loss = 0.01073768176138401
Validation loss = 0.011758043430745602
Validation loss = 0.011445802636444569
Validation loss = 0.011815202422440052
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012785536237061024
Validation loss = 0.010008892975747585
Validation loss = 0.012151777744293213
Validation loss = 0.010762135498225689
Validation loss = 0.009609115310013294
Validation loss = 0.009980118833482265
Validation loss = 0.010003379546105862
Validation loss = 0.011942988261580467
Validation loss = 0.00795317254960537
Validation loss = 0.010880060493946075
Validation loss = 0.010276439599692822
Validation loss = 0.00999057199805975
Validation loss = 0.009872227907180786
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016245519742369652
Validation loss = 0.012739748694002628
Validation loss = 0.009744449518620968
Validation loss = 0.011304330080747604
Validation loss = 0.011419680900871754
Validation loss = 0.011936672963202
Validation loss = 0.01122505497187376
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00112  |
| Iteration     | 13        |
| MaximumReturn | -0.000554 |
| MinimumReturn | -0.00411  |
| TotalSamples  | 24990     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012466725893318653
Validation loss = 0.012003645300865173
Validation loss = 0.009107884019613266
Validation loss = 0.011147822253406048
Validation loss = 0.009006115607917309
Validation loss = 0.008070573210716248
Validation loss = 0.016593964770436287
Validation loss = 0.011628393083810806
Validation loss = 0.008357026614248753
Validation loss = 0.00933996494859457
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014583089388906956
Validation loss = 0.008730107918381691
Validation loss = 0.009998920373618603
Validation loss = 0.010080446489155293
Validation loss = 0.011883110739290714
Validation loss = 0.010607671923935413
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009456162340939045
Validation loss = 0.01216487679630518
Validation loss = 0.010670941323041916
Validation loss = 0.008808866143226624
Validation loss = 0.010278148576617241
Validation loss = 0.009841914288699627
Validation loss = 0.011220626533031464
Validation loss = 0.009262166917324066
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009810833260416985
Validation loss = 0.009301440790295601
Validation loss = 0.009345990605652332
Validation loss = 0.00910460390150547
Validation loss = 0.008501715958118439
Validation loss = 0.010058951564133167
Validation loss = 0.009599551558494568
Validation loss = 0.010743610560894012
Validation loss = 0.008704517036676407
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010783913545310497
Validation loss = 0.009882869198918343
Validation loss = 0.010312912985682487
Validation loss = 0.011016390286386013
Validation loss = 0.012663464993238449
Validation loss = 0.010325628332793713
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00116  |
| Iteration     | 14        |
| MaximumReturn | -0.000484 |
| MinimumReturn | -0.00563  |
| TotalSamples  | 26656     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008988583460450172
Validation loss = 0.010961362160742283
Validation loss = 0.01137646846473217
Validation loss = 0.010606394149363041
Validation loss = 0.009906702674925327
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010949145071208477
Validation loss = 0.009603737853467464
Validation loss = 0.009542032144963741
Validation loss = 0.010369982570409775
Validation loss = 0.010121187195181847
Validation loss = 0.010219230316579342
Validation loss = 0.00906350091099739
Validation loss = 0.01110016368329525
Validation loss = 0.0091300830245018
Validation loss = 0.01029529981315136
Validation loss = 0.009110652841627598
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008899955078959465
Validation loss = 0.009780505672097206
Validation loss = 0.010596164502203465
Validation loss = 0.010762481950223446
Validation loss = 0.008288501761853695
Validation loss = 0.008674407377839088
Validation loss = 0.009284819476306438
Validation loss = 0.00843331590294838
Validation loss = 0.009004895575344563
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009468823671340942
Validation loss = 0.009997507557272911
Validation loss = 0.009374345652759075
Validation loss = 0.008580074645578861
Validation loss = 0.008052169345319271
Validation loss = 0.01033869944512844
Validation loss = 0.009234437718987465
Validation loss = 0.01026109978556633
Validation loss = 0.009915103204548359
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009990649297833443
Validation loss = 0.011890394613146782
Validation loss = 0.011858736164867878
Validation loss = 0.01038442738354206
Validation loss = 0.010137681849300861
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00376 |
| Iteration     | 15       |
| MaximumReturn | -0.00057 |
| MinimumReturn | -0.0249  |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01342330127954483
Validation loss = 0.010019931010901928
Validation loss = 0.008735290728509426
Validation loss = 0.006982861552387476
Validation loss = 0.008245198987424374
Validation loss = 0.011782367713749409
Validation loss = 0.008304459974169731
Validation loss = 0.009785046800971031
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01591835357248783
Validation loss = 0.011318345554172993
Validation loss = 0.010308935306966305
Validation loss = 0.009668990969657898
Validation loss = 0.00853811390697956
Validation loss = 0.00734097370877862
Validation loss = 0.011660630814731121
Validation loss = 0.007912268862128258
Validation loss = 0.008646884933114052
Validation loss = 0.006863925140351057
Validation loss = 0.007628596853464842
Validation loss = 0.009111893363296986
Validation loss = 0.008130259811878204
Validation loss = 0.008289024233818054
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007919969037175179
Validation loss = 0.007526727858930826
Validation loss = 0.01245211623609066
Validation loss = 0.007213904522359371
Validation loss = 0.007897039875388145
Validation loss = 0.00920411292463541
Validation loss = 0.01167672872543335
Validation loss = 0.008105299435555935
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009475779719650745
Validation loss = 0.009487022645771503
Validation loss = 0.00860926229506731
Validation loss = 0.008087306283414364
Validation loss = 0.013535511679947376
Validation loss = 0.009977133944630623
Validation loss = 0.007955599576234818
Validation loss = 0.00669434666633606
Validation loss = 0.012428410351276398
Validation loss = 0.010929632000625134
Validation loss = 0.007242756430059671
Validation loss = 0.007331184111535549
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.013996376655995846
Validation loss = 0.009212064556777477
Validation loss = 0.008863532915711403
Validation loss = 0.009466038085520267
Validation loss = 0.00889187678694725
Validation loss = 0.012601720169186592
Validation loss = 0.00739699462428689
Validation loss = 0.009454397484660149
Validation loss = 0.010281459428369999
Validation loss = 0.00786562543362379
Validation loss = 0.0075254375115036964
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00292  |
| Iteration     | 16        |
| MaximumReturn | -0.000586 |
| MinimumReturn | -0.0226   |
| TotalSamples  | 29988     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008858172222971916
Validation loss = 0.008113516494631767
Validation loss = 0.008288641460239887
Validation loss = 0.008091693744063377
Validation loss = 0.007286373060196638
Validation loss = 0.007829741574823856
Validation loss = 0.010295258834958076
Validation loss = 0.011086991056799889
Validation loss = 0.008469006977975368
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00986501295119524
Validation loss = 0.00784195214509964
Validation loss = 0.008198399096727371
Validation loss = 0.007751611061394215
Validation loss = 0.0067911166697740555
Validation loss = 0.009022099897265434
Validation loss = 0.008073495700955391
Validation loss = 0.007504743058234453
Validation loss = 0.008417634293437004
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007574076764285564
Validation loss = 0.00657452829182148
Validation loss = 0.008666183799505234
Validation loss = 0.009706861339509487
Validation loss = 0.007336972281336784
Validation loss = 0.008903143927454948
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012421812862157822
Validation loss = 0.012140636332333088
Validation loss = 0.007873671129345894
Validation loss = 0.008304963819682598
Validation loss = 0.007139082532376051
Validation loss = 0.01290963962674141
Validation loss = 0.007729037664830685
Validation loss = 0.00754391448572278
Validation loss = 0.010193338617682457
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007720686960965395
Validation loss = 0.00864486489444971
Validation loss = 0.0100203612819314
Validation loss = 0.00991748459637165
Validation loss = 0.008325256407260895
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00588  |
| Iteration     | 17        |
| MaximumReturn | -0.000551 |
| MinimumReturn | -0.0291   |
| TotalSamples  | 31654     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01101740077137947
Validation loss = 0.008530032820999622
Validation loss = 0.009201391600072384
Validation loss = 0.00854574516415596
Validation loss = 0.007185640744864941
Validation loss = 0.007921035401523113
Validation loss = 0.007284709718078375
Validation loss = 0.0065813553519546986
Validation loss = 0.00764124933630228
Validation loss = 0.008338247425854206
Validation loss = 0.007505548652261496
Validation loss = 0.008026141673326492
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008857887238264084
Validation loss = 0.008322215639054775
Validation loss = 0.008724690414965153
Validation loss = 0.008109962567687035
Validation loss = 0.008009537123143673
Validation loss = 0.013972226530313492
Validation loss = 0.008012785576283932
Validation loss = 0.013253691606223583
Validation loss = 0.008475969545543194
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009586864151060581
Validation loss = 0.009492209181189537
Validation loss = 0.0093860337510705
Validation loss = 0.00763664348050952
Validation loss = 0.010623779147863388
Validation loss = 0.008819660171866417
Validation loss = 0.00814048945903778
Validation loss = 0.00680089695379138
Validation loss = 0.00779891898855567
Validation loss = 0.016111556440591812
Validation loss = 0.010665268637239933
Validation loss = 0.007785116322338581
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008082670159637928
Validation loss = 0.007319862022995949
Validation loss = 0.008935647085309029
Validation loss = 0.008101671002805233
Validation loss = 0.007488495670258999
Validation loss = 0.01030606310814619
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008363749831914902
Validation loss = 0.010199946351349354
Validation loss = 0.007439256180077791
Validation loss = 0.011435986496508121
Validation loss = 0.007518922910094261
Validation loss = 0.0091927545145154
Validation loss = 0.008577486500144005
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0017   |
| Iteration     | 18        |
| MaximumReturn | -0.000555 |
| MinimumReturn | -0.015    |
| TotalSamples  | 33320     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008599176071584225
Validation loss = 0.010968350805342197
Validation loss = 0.007013809401541948
Validation loss = 0.008972203359007835
Validation loss = 0.00873564649373293
Validation loss = 0.00815792940557003
Validation loss = 0.008591189980506897
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007981630973517895
Validation loss = 0.008115557953715324
Validation loss = 0.00805752445012331
Validation loss = 0.009146093390882015
Validation loss = 0.007419227156788111
Validation loss = 0.0075750150717794895
Validation loss = 0.0072228750213980675
Validation loss = 0.00821937806904316
Validation loss = 0.00839853473007679
Validation loss = 0.007190243806689978
Validation loss = 0.007338232826441526
Validation loss = 0.012430732138454914
Validation loss = 0.008232973515987396
Validation loss = 0.008045193739235401
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008210796862840652
Validation loss = 0.008839896880090237
Validation loss = 0.009565910324454308
Validation loss = 0.013171977363526821
Validation loss = 0.010039281100034714
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008062776178121567
Validation loss = 0.00902769435197115
Validation loss = 0.00636841356754303
Validation loss = 0.006696820724755526
Validation loss = 0.008531813509762287
Validation loss = 0.008172657340765
Validation loss = 0.009592545218765736
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008249766193330288
Validation loss = 0.007403227966278791
Validation loss = 0.007569523062556982
Validation loss = 0.007675041910260916
Validation loss = 0.009932473301887512
Validation loss = 0.00789958517998457
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00666  |
| Iteration     | 19        |
| MaximumReturn | -0.000583 |
| MinimumReturn | -0.0431   |
| TotalSamples  | 34986     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013311493210494518
Validation loss = 0.008598582819104195
Validation loss = 0.008423157967627048
Validation loss = 0.009479070082306862
Validation loss = 0.006986830849200487
Validation loss = 0.0076169478707015514
Validation loss = 0.00734685966745019
Validation loss = 0.007509466260671616
Validation loss = 0.00880180113017559
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00950336642563343
Validation loss = 0.007059410214424133
Validation loss = 0.007562477141618729
Validation loss = 0.007144422270357609
Validation loss = 0.006949711591005325
Validation loss = 0.00823405385017395
Validation loss = 0.007175021804869175
Validation loss = 0.008630814962089062
Validation loss = 0.007386497687548399
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008708721026778221
Validation loss = 0.007967237383127213
Validation loss = 0.0074109784327447414
Validation loss = 0.006266498938202858
Validation loss = 0.007838211953639984
Validation loss = 0.005742876790463924
Validation loss = 0.006588926073163748
Validation loss = 0.0062531884759664536
Validation loss = 0.009269238449633121
Validation loss = 0.007102855946868658
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.014006449840962887
Validation loss = 0.009299814701080322
Validation loss = 0.009010311216115952
Validation loss = 0.006433945149183273
Validation loss = 0.007643118500709534
Validation loss = 0.006975822150707245
Validation loss = 0.008328912779688835
Validation loss = 0.008249188773334026
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01006581075489521
Validation loss = 0.007565355394035578
Validation loss = 0.008638384751975536
Validation loss = 0.009052864275872707
Validation loss = 0.0074344356544315815
Validation loss = 0.00742339389398694
Validation loss = 0.007912174798548222
Validation loss = 0.007171022240072489
Validation loss = 0.008180486969649792
Validation loss = 0.008653176948428154
Validation loss = 0.00789191760122776
Validation loss = 0.008211595937609673
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0168   |
| Iteration     | 20        |
| MaximumReturn | -0.000626 |
| MinimumReturn | -0.318    |
| TotalSamples  | 36652     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010259689763188362
Validation loss = 0.01122638862580061
Validation loss = 0.007741457782685757
Validation loss = 0.008577700704336166
Validation loss = 0.006545710377395153
Validation loss = 0.006681831553578377
Validation loss = 0.007210025563836098
Validation loss = 0.0056746224872767925
Validation loss = 0.007093904539942741
Validation loss = 0.0071820649318397045
Validation loss = 0.00714908866211772
Validation loss = 0.005724555347114801
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014293129555881023
Validation loss = 0.008429310284554958
Validation loss = 0.00660603167489171
Validation loss = 0.006587345618754625
Validation loss = 0.010652043856680393
Validation loss = 0.009085796773433685
Validation loss = 0.005511513911187649
Validation loss = 0.007008240558207035
Validation loss = 0.005505536682903767
Validation loss = 0.006320410408079624
Validation loss = 0.006396278738975525
Validation loss = 0.006575873587280512
Validation loss = 0.005793611519038677
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.012144524604082108
Validation loss = 0.008796794340014458
Validation loss = 0.00624843081459403
Validation loss = 0.005482343956828117
Validation loss = 0.0057116285897791386
Validation loss = 0.005486449226737022
Validation loss = 0.007838086225092411
Validation loss = 0.0060683549381792545
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008736521005630493
Validation loss = 0.00792139396071434
Validation loss = 0.007497310638427734
Validation loss = 0.008672095835208893
Validation loss = 0.006987778469920158
Validation loss = 0.006110740825533867
Validation loss = 0.008456145413219929
Validation loss = 0.009439320303499699
Validation loss = 0.009235905483365059
Validation loss = 0.006261968519538641
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01328261848539114
Validation loss = 0.011788678355515003
Validation loss = 0.008982798084616661
Validation loss = 0.0074413917027413845
Validation loss = 0.006966494023799896
Validation loss = 0.007820477709174156
Validation loss = 0.006831912323832512
Validation loss = 0.008299129083752632
Validation loss = 0.0062906015664339066
Validation loss = 0.006945777218788862
Validation loss = 0.0066169570200145245
Validation loss = 0.006986796855926514
Validation loss = 0.006724158767610788
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.98     |
| Iteration     | 21        |
| MaximumReturn | -0.000765 |
| MinimumReturn | -19.9     |
| TotalSamples  | 38318     |
-----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0076937065459787846
Validation loss = 0.008397591300308704
Validation loss = 0.007464444264769554
Validation loss = 0.009261209517717361
Validation loss = 0.007136316038668156
Validation loss = 0.00858478806912899
Validation loss = 0.007072331849485636
Validation loss = 0.006105277221649885
Validation loss = 0.007809213828295469
Validation loss = 0.006948449183255434
Validation loss = 0.007720146328210831
Validation loss = 0.006715396419167519
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013248122297227383
Validation loss = 0.010998005047440529
Validation loss = 0.006225449498742819
Validation loss = 0.007525285705924034
Validation loss = 0.007483239285647869
Validation loss = 0.006312953773885965
Validation loss = 0.006685047410428524
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011663297191262245
Validation loss = 0.007903012447059155
Validation loss = 0.009523052722215652
Validation loss = 0.00818830169737339
Validation loss = 0.00786379911005497
Validation loss = 0.007340583484619856
Validation loss = 0.009612789377570152
Validation loss = 0.008826829493045807
Validation loss = 0.00437005702406168
Validation loss = 0.007113583851605654
Validation loss = 0.006596034858375788
Validation loss = 0.007432489190250635
Validation loss = 0.00522927101701498
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012338320724666119
Validation loss = 0.007531268987804651
Validation loss = 0.007086261175572872
Validation loss = 0.008130759000778198
Validation loss = 0.007424037903547287
Validation loss = 0.008997911587357521
Validation loss = 0.009895962662994862
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01112421602010727
Validation loss = 0.007858437485992908
Validation loss = 0.008530966937541962
Validation loss = 0.009482749737799168
Validation loss = 0.00845868419855833
Validation loss = 0.00667875912040472
Validation loss = 0.009511920623481274
Validation loss = 0.006800348870456219
Validation loss = 0.008551514707505703
Validation loss = 0.016040971502661705
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -47      |
| Iteration     | 22       |
| MaximumReturn | -0.491   |
| MinimumReturn | -83.3    |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01263928972184658
Validation loss = 0.008875824511051178
Validation loss = 0.008754362352192402
Validation loss = 0.0070292046293616295
Validation loss = 0.01011447049677372
Validation loss = 0.007734826300293207
Validation loss = 0.005637056194245815
Validation loss = 0.006209357175976038
Validation loss = 0.008662390522658825
Validation loss = 0.006946180947124958
Validation loss = 0.005966356955468655
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014181053265929222
Validation loss = 0.0071745989844202995
Validation loss = 0.007335207425057888
Validation loss = 0.007397190667688847
Validation loss = 0.005378842353820801
Validation loss = 0.005874992348253727
Validation loss = 0.005979560315608978
Validation loss = 0.00620183814316988
Validation loss = 0.021201495081186295
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.015974629670381546
Validation loss = 0.007722546812146902
Validation loss = 0.007111733313649893
Validation loss = 0.007005131803452969
Validation loss = 0.00521373376250267
Validation loss = 0.011649960651993752
Validation loss = 0.0055636754259467125
Validation loss = 0.0060175033286213875
Validation loss = 0.009148148819804192
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.015235411003232002
Validation loss = 0.010784199461340904
Validation loss = 0.006133038550615311
Validation loss = 0.0067823207937181
Validation loss = 0.006999222096055746
Validation loss = 0.005523869302123785
Validation loss = 0.005767627619206905
Validation loss = 0.0077313766814768314
Validation loss = 0.007581852376461029
Validation loss = 0.005243572406470776
Validation loss = 0.005788666196167469
Validation loss = 0.004965840373188257
Validation loss = 0.004612681455910206
Validation loss = 0.006506744772195816
Validation loss = 0.007324114441871643
Validation loss = 0.004246272146701813
Validation loss = 0.007575432304292917
Validation loss = 0.004839252680540085
Validation loss = 0.005835074465721846
Validation loss = 0.006554494146257639
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015044013038277626
Validation loss = 0.01210814993828535
Validation loss = 0.011082981713116169
Validation loss = 0.010822413489222527
Validation loss = 0.00861221831291914
Validation loss = 0.006142732687294483
Validation loss = 0.007444107439368963
Validation loss = 0.007308134343475103
Validation loss = 0.006207679398357868
Validation loss = 0.008636372163891792
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0436  |
| Iteration     | 23       |
| MaximumReturn | -0.0316  |
| MinimumReturn | -0.0608  |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008696073666214943
Validation loss = 0.005262970458716154
Validation loss = 0.005708814598619938
Validation loss = 0.006581738591194153
Validation loss = 0.00692442711442709
Validation loss = 0.007175185717642307
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.008916187100112438
Validation loss = 0.005266147665679455
Validation loss = 0.005091762635856867
Validation loss = 0.0055017294362187386
Validation loss = 0.005733863450586796
Validation loss = 0.005036991089582443
Validation loss = 0.005241671111434698
Validation loss = 0.0051180524751544
Validation loss = 0.004930573981255293
Validation loss = 0.00438143964856863
Validation loss = 0.0058005559258162975
Validation loss = 0.004217666573822498
Validation loss = 0.005595841445028782
Validation loss = 0.011083325371146202
Validation loss = 0.0066415732726454735
Validation loss = 0.005533422343432903
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007959720678627491
Validation loss = 0.006071079988032579
Validation loss = 0.004381419159471989
Validation loss = 0.004148419946432114
Validation loss = 0.004605915397405624
Validation loss = 0.004071524832397699
Validation loss = 0.0063193561509251595
Validation loss = 0.006677425000816584
Validation loss = 0.004717754200100899
Validation loss = 0.004131912253797054
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005575171206146479
Validation loss = 0.004259263630956411
Validation loss = 0.005378803238272667
Validation loss = 0.004669659771025181
Validation loss = 0.005375859327614307
Validation loss = 0.006481432821601629
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012793888337910175
Validation loss = 0.006285020615905523
Validation loss = 0.006130490452051163
Validation loss = 0.0054995655082166195
Validation loss = 0.005237242206931114
Validation loss = 0.006922728382050991
Validation loss = 0.0079079270362854
Validation loss = 0.004939692560583353
Validation loss = 0.004148658365011215
Validation loss = 0.006776459515094757
Validation loss = 0.00491506839171052
Validation loss = 0.005104313604533672
Validation loss = 0.006129276007413864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.01    |
| Iteration     | 24       |
| MaximumReturn | -0.0312  |
| MinimumReturn | -7.37    |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007698477245867252
Validation loss = 0.005253723822534084
Validation loss = 0.0063137030228972435
Validation loss = 0.004804374184459448
Validation loss = 0.004091063514351845
Validation loss = 0.004535696003586054
Validation loss = 0.005819716025143862
Validation loss = 0.004349360242486
Validation loss = 0.005629171151667833
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0058341664262115955
Validation loss = 0.0047067878767848015
Validation loss = 0.004390823654830456
Validation loss = 0.005189039744436741
Validation loss = 0.004750031046569347
Validation loss = 0.004308469593524933
Validation loss = 0.004229081328958273
Validation loss = 0.003985597286373377
Validation loss = 0.005361061543226242
Validation loss = 0.00560891954228282
Validation loss = 0.008044690825045109
Validation loss = 0.004013193305581808
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009567352943122387
Validation loss = 0.006847369018942118
Validation loss = 0.005939059425145388
Validation loss = 0.004536367021501064
Validation loss = 0.0047057075425982475
Validation loss = 0.004888763651251793
Validation loss = 0.006771049462258816
Validation loss = 0.004587809555232525
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007820946164429188
Validation loss = 0.005348697770386934
Validation loss = 0.004621393512934446
Validation loss = 0.004008748568594456
Validation loss = 0.00424406910315156
Validation loss = 0.00547934090718627
Validation loss = 0.004240368492901325
Validation loss = 0.004537420347332954
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008632244542241096
Validation loss = 0.006151962094008923
Validation loss = 0.006483405828475952
Validation loss = 0.004413100425153971
Validation loss = 0.00933895818889141
Validation loss = 0.006591165903955698
Validation loss = 0.0036963713355362415
Validation loss = 0.004813331179320812
Validation loss = 0.0036952116061002016
Validation loss = 0.00445706769824028
Validation loss = 0.008636536076664925
Validation loss = 0.0068841129541397095
Validation loss = 0.0033950130455195904
Validation loss = 0.009263540618121624
Validation loss = 0.0067277890630066395
Validation loss = 0.004011793062090874
Validation loss = 0.0052446043118834496
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.144   |
| Iteration     | 25       |
| MaximumReturn | -0.0901  |
| MinimumReturn | -0.218   |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005944942124187946
Validation loss = 0.004730579908937216
Validation loss = 0.004371763207018375
Validation loss = 0.004530859179794788
Validation loss = 0.004098362755030394
Validation loss = 0.0045999144203960896
Validation loss = 0.004374629817903042
Validation loss = 0.004233262967318296
Validation loss = 0.004839624743908644
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.004173362161964178
Validation loss = 0.0039379228837788105
Validation loss = 0.004215228371322155
Validation loss = 0.0037016896530985832
Validation loss = 0.004388507921248674
Validation loss = 0.004219292663037777
Validation loss = 0.004692308139055967
Validation loss = 0.0042208475060760975
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0038317861035466194
Validation loss = 0.00789432693272829
Validation loss = 0.004099701065570116
Validation loss = 0.006460766773670912
Validation loss = 0.004413757007569075
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003891386091709137
Validation loss = 0.005365506745874882
Validation loss = 0.004088735673576593
Validation loss = 0.0039308927953243256
Validation loss = 0.004303750116378069
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004625161178410053
Validation loss = 0.006473142188042402
Validation loss = 0.00480584567412734
Validation loss = 0.005156316794455051
Validation loss = 0.0037372158840298653
Validation loss = 0.0038377782329916954
Validation loss = 0.00473483931273222
Validation loss = 0.005055672023445368
Validation loss = 0.00801235530525446
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.138   |
| Iteration     | 26       |
| MaximumReturn | -0.0528  |
| MinimumReturn | -0.228   |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005058678798377514
Validation loss = 0.003488171147182584
Validation loss = 0.003514101728796959
Validation loss = 0.0036607258953154087
Validation loss = 0.0038684054743498564
Validation loss = 0.003975285217165947
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0038678788114339113
Validation loss = 0.003099964000284672
Validation loss = 0.003248829860240221
Validation loss = 0.002620759652927518
Validation loss = 0.003572325687855482
Validation loss = 0.004350921604782343
Validation loss = 0.0039865621365606785
Validation loss = 0.0031782356090843678
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0041319564916193485
Validation loss = 0.0032683699391782284
Validation loss = 0.003945800941437483
Validation loss = 0.0033400142565369606
Validation loss = 0.006066702771931887
Validation loss = 0.003627874655649066
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003880455857142806
Validation loss = 0.003319657640531659
Validation loss = 0.006067125592380762
Validation loss = 0.00338451168499887
Validation loss = 0.0037218930665403605
Validation loss = 0.0034912358969449997
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0038935435004532337
Validation loss = 0.0045240335166454315
Validation loss = 0.004859780427068472
Validation loss = 0.0034354703966528177
Validation loss = 0.003167938906699419
Validation loss = 0.003699029330164194
Validation loss = 0.00401287293061614
Validation loss = 0.003638704540207982
Validation loss = 0.0038026669062674046
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0502  |
| Iteration     | 27       |
| MaximumReturn | -0.036   |
| MinimumReturn | -0.0836  |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005685001611709595
Validation loss = 0.0076119122095406055
Validation loss = 0.003009168663993478
Validation loss = 0.005847762804478407
Validation loss = 0.005896504502743483
Validation loss = 0.0034949227701872587
Validation loss = 0.0038948983419686556
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0033049106132239103
Validation loss = 0.0024581162724643946
Validation loss = 0.002612616866827011
Validation loss = 0.0043176752515137196
Validation loss = 0.002756228670477867
Validation loss = 0.0034937437158077955
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.007316999137401581
Validation loss = 0.009762763045728207
Validation loss = 0.003467683680355549
Validation loss = 0.003300697775557637
Validation loss = 0.003711433382704854
Validation loss = 0.004493283107876778
Validation loss = 0.0028976507019251585
Validation loss = 0.0035681482404470444
Validation loss = 0.002493522362783551
Validation loss = 0.0045964340679347515
Validation loss = 0.003796672448515892
Validation loss = 0.004157402087002993
Validation loss = 0.0034768010955303907
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004225750919431448
Validation loss = 0.005321760196238756
Validation loss = 0.003664916381239891
Validation loss = 0.003663077251985669
Validation loss = 0.004276958294212818
Validation loss = 0.003457160433754325
Validation loss = 0.003171253716573119
Validation loss = 0.0036688789259642363
Validation loss = 0.0034170581493526697
Validation loss = 0.003305007703602314
Validation loss = 0.0036553575191646814
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00945428479462862
Validation loss = 0.0031655842904001474
Validation loss = 0.004658850375562906
Validation loss = 0.003130049677565694
Validation loss = 0.0028832571115344763
Validation loss = 0.0036715411115437746
Validation loss = 0.0033609559759497643
Validation loss = 0.005251547787338495
Validation loss = 0.006174356210976839
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.448   |
| Iteration     | 28       |
| MaximumReturn | -0.101   |
| MinimumReturn | -5.78    |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0032770177349448204
Validation loss = 0.002937741344794631
Validation loss = 0.0035778041929006577
Validation loss = 0.002966015599668026
Validation loss = 0.0031051295809447765
Validation loss = 0.004277148749679327
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.003849615342915058
Validation loss = 0.004412089474499226
Validation loss = 0.002554318867623806
Validation loss = 0.0023026345297694206
Validation loss = 0.002659565769135952
Validation loss = 0.0036368402652442455
Validation loss = 0.002595566213130951
Validation loss = 0.002368505345657468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0031040662433952093
Validation loss = 0.003134946571663022
Validation loss = 0.0024992122780531645
Validation loss = 0.0026088175363838673
Validation loss = 0.003490408882498741
Validation loss = 0.0029840937349945307
Validation loss = 0.0026270654052495956
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00275827432051301
Validation loss = 0.0024386467412114143
Validation loss = 0.0038590016774833202
Validation loss = 0.0021888758055865765
Validation loss = 0.004098398610949516
Validation loss = 0.002822417998686433
Validation loss = 0.0018060978036373854
Validation loss = 0.0035776072181761265
Validation loss = 0.003984019625931978
Validation loss = 0.003588956082239747
Validation loss = 0.00214100768789649
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0037964037619531155
Validation loss = 0.0022441032342612743
Validation loss = 0.00284199183806777
Validation loss = 0.0025181376840919256
Validation loss = 0.002299664542078972
Validation loss = 0.002277732128277421
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.159   |
| Iteration     | 29       |
| MaximumReturn | -0.0804  |
| MinimumReturn | -0.234   |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.003085982520133257
Validation loss = 0.002459293231368065
Validation loss = 0.002148662693798542
Validation loss = 0.003063952550292015
Validation loss = 0.0029272520914673805
Validation loss = 0.0019421592587605119
Validation loss = 0.0022282875142991543
Validation loss = 0.002716955030336976
Validation loss = 0.0019957898184657097
Validation loss = 0.002788633806630969
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0027957474812865257
Validation loss = 0.002535095438361168
Validation loss = 0.0027345374692231417
Validation loss = 0.002450943924486637
Validation loss = 0.002136950148269534
Validation loss = 0.0024784288834780455
Validation loss = 0.001503219362348318
Validation loss = 0.0030001052655279636
Validation loss = 0.0017233550315722823
Validation loss = 0.0029047245625406504
Validation loss = 0.002384698949754238
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002764264354482293
Validation loss = 0.0021407799795269966
Validation loss = 0.0024959833826869726
Validation loss = 0.002157727023586631
Validation loss = 0.0039209360256791115
Validation loss = 0.003249269677326083
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002524253446608782
Validation loss = 0.0018756515346467495
Validation loss = 0.0022972964216023684
Validation loss = 0.0018674463499337435
Validation loss = 0.0021517772693187
Validation loss = 0.0025158936623483896
Validation loss = 0.0025066635571420193
Validation loss = 0.002600855426862836
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002292697550728917
Validation loss = 0.002454958623275161
Validation loss = 0.002530017402023077
Validation loss = 0.002185254590585828
Validation loss = 0.002719731070101261
Validation loss = 0.0020641698502004147
Validation loss = 0.0015243670204654336
Validation loss = 0.0032361226622015238
Validation loss = 0.002371974987909198
Validation loss = 0.002134377835318446
Validation loss = 0.0021185465157032013
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.75    |
| Iteration     | 30       |
| MaximumReturn | -0.102   |
| MinimumReturn | -36      |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0043932171538472176
Validation loss = 0.0029457351192831993
Validation loss = 0.0024399380199611187
Validation loss = 0.003186397487297654
Validation loss = 0.00300655048340559
Validation loss = 0.0021879938431084156
Validation loss = 0.0024788803420960903
Validation loss = 0.002148466417565942
Validation loss = 0.001694030244834721
Validation loss = 0.001718025654554367
Validation loss = 0.0016556504415348172
Validation loss = 0.0015859821578487754
Validation loss = 0.0021576217841356993
Validation loss = 0.0018254183232784271
Validation loss = 0.002002055523917079
Validation loss = 0.001520458492450416
Validation loss = 0.003953941632062197
Validation loss = 0.004352552350610495
Validation loss = 0.002511114813387394
Validation loss = 0.0019481783965602517
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0029904060065746307
Validation loss = 0.0016908806283026934
Validation loss = 0.0018156780861318111
Validation loss = 0.001978297019377351
Validation loss = 0.0020574391819536686
Validation loss = 0.0019893525168299675
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003425732720643282
Validation loss = 0.002235625870525837
Validation loss = 0.002823112066835165
Validation loss = 0.0034079065080732107
Validation loss = 0.001904914854094386
Validation loss = 0.003292766399681568
Validation loss = 0.0025528150144964457
Validation loss = 0.0021133252885192633
Validation loss = 0.0017060788813978434
Validation loss = 0.0020649076905101538
Validation loss = 0.0020879716612398624
Validation loss = 0.0024838121607899666
Validation loss = 0.002730891341343522
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.004875150043517351
Validation loss = 0.0022843179758638144
Validation loss = 0.00282861664891243
Validation loss = 0.002627786248922348
Validation loss = 0.002533889142796397
Validation loss = 0.0016656429506838322
Validation loss = 0.002468246966600418
Validation loss = 0.0017993630608543754
Validation loss = 0.002386469626799226
Validation loss = 0.002047340851277113
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0028436072170734406
Validation loss = 0.0022245522122830153
Validation loss = 0.002152626868337393
Validation loss = 0.0018797616939991713
Validation loss = 0.015259905718266964
Validation loss = 0.0032726337667554617
Validation loss = 0.0017568351468071342
Validation loss = 0.0017760818591341376
Validation loss = 0.0024084029719233513
Validation loss = 0.003485338296741247
Validation loss = 0.0020239492878317833
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.314   |
| Iteration     | 31       |
| MaximumReturn | -0.189   |
| MinimumReturn | -0.717   |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0019496530294418335
Validation loss = 0.0016208606539294124
Validation loss = 0.0033388985320925713
Validation loss = 0.0011576310498639941
Validation loss = 0.001616329187527299
Validation loss = 0.0016490533016622066
Validation loss = 0.001414911588653922
Validation loss = 0.002043310087174177
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0017809681594371796
Validation loss = 0.0016177688958123326
Validation loss = 0.001367387012578547
Validation loss = 0.0014842440141364932
Validation loss = 0.0018317162757739425
Validation loss = 0.002582926070317626
Validation loss = 0.0012248150305822492
Validation loss = 0.004710567183792591
Validation loss = 0.0015298796351999044
Validation loss = 0.002332363510504365
Validation loss = 0.0020152528304606676
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.002551564248278737
Validation loss = 0.0016276242677122355
Validation loss = 0.0013195460196584463
Validation loss = 0.001360130961984396
Validation loss = 0.0015619610203430057
Validation loss = 0.0026293385308235884
Validation loss = 0.0022391288075596094
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0017993084620684385
Validation loss = 0.0014200439909473062
Validation loss = 0.0018035976681858301
Validation loss = 0.00251246546395123
Validation loss = 0.0022585068363696337
Validation loss = 0.0014216830022633076
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016860259929671884
Validation loss = 0.001687722047790885
Validation loss = 0.0015049456851556897
Validation loss = 0.0013232179917395115
Validation loss = 0.0014976292150095105
Validation loss = 0.0021417830139398575
Validation loss = 0.001155351521447301
Validation loss = 0.0023319264873862267
Validation loss = 0.0021187704987823963
Validation loss = 0.0013650719774886966
Validation loss = 0.0018148281378671527
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.29    |
| Iteration     | 32       |
| MaximumReturn | -0.00178 |
| MinimumReturn | -18.5    |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016196143114939332
Validation loss = 0.003855151357129216
Validation loss = 0.0013563001994043589
Validation loss = 0.0013261830899864435
Validation loss = 0.001781612285412848
Validation loss = 0.0015411613276228309
Validation loss = 0.0023180542048066854
Validation loss = 0.0013566504931077361
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0021267107222229242
Validation loss = 0.0016388624208047986
Validation loss = 0.0017840927466750145
Validation loss = 0.001407457864843309
Validation loss = 0.0013645404251292348
Validation loss = 0.001874576904810965
Validation loss = 0.001833744696341455
Validation loss = 0.0017363085644319654
Validation loss = 0.0014306893572211266
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0022784308530390263
Validation loss = 0.0013084195088595152
Validation loss = 0.0019304795423522592
Validation loss = 0.0012663508532568812
Validation loss = 0.0013550280127674341
Validation loss = 0.0014151058858260512
Validation loss = 0.0022182511165738106
Validation loss = 0.001507861539721489
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.002199795562773943
Validation loss = 0.0036423068959265947
Validation loss = 0.0014936106745153666
Validation loss = 0.002201830968260765
Validation loss = 0.00148971367161721
Validation loss = 0.0012598177418112755
Validation loss = 0.0017976075178012252
Validation loss = 0.0018434170633554459
Validation loss = 0.0015347470762208104
Validation loss = 0.0012513665715232491
Validation loss = 0.0016721503343433142
Validation loss = 0.0013280723942443728
Validation loss = 0.0012394038494676352
Validation loss = 0.001632443629205227
Validation loss = 0.0012346297735348344
Validation loss = 0.002155265072360635
Validation loss = 0.0011010168818756938
Validation loss = 0.001805132138542831
Validation loss = 0.002196050714701414
Validation loss = 0.0010727607877925038
Validation loss = 0.0011787355178967118
Validation loss = 0.0015113644767552614
Validation loss = 0.0016873758286237717
Validation loss = 0.0015538762090727687
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.004558056592941284
Validation loss = 0.0016046402743086219
Validation loss = 0.0012050627265125513
Validation loss = 0.0014302838826552033
Validation loss = 0.0020982439164072275
Validation loss = 0.0014315665466710925
Validation loss = 0.0018437904072925448
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.3    |
| Iteration     | 33       |
| MaximumReturn | -0.299   |
| MinimumReturn | -64.6    |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021633573342114687
Validation loss = 0.0008368238341063261
Validation loss = 0.0014141886495053768
Validation loss = 0.0009729698649607599
Validation loss = 0.0016891557024791837
Validation loss = 0.0009018866112455726
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.002594440244138241
Validation loss = 0.0019413672853261232
Validation loss = 0.0009686798439361155
Validation loss = 0.0012247206177562475
Validation loss = 0.0013201250694692135
Validation loss = 0.001115685678087175
Validation loss = 0.0012036763364449143
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017086892621591687
Validation loss = 0.001608923077583313
Validation loss = 0.0015031637158244848
Validation loss = 0.0008275723666884005
Validation loss = 0.001212647883221507
Validation loss = 0.0009680335060693324
Validation loss = 0.002296461258083582
Validation loss = 0.0013084127567708492
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003337640780955553
Validation loss = 0.001434764708392322
Validation loss = 0.0011096030939370394
Validation loss = 0.0008704936481080949
Validation loss = 0.0013183372793719172
Validation loss = 0.0014622202143073082
Validation loss = 0.0007094366592355072
Validation loss = 0.000806718016974628
Validation loss = 0.0018652320140972733
Validation loss = 0.0009895835537463427
Validation loss = 0.0008838563226163387
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0016445467481389642
Validation loss = 0.0013847367372363806
Validation loss = 0.0009310782188549638
Validation loss = 0.0012575198197737336
Validation loss = 0.001555633614771068
Validation loss = 0.0011284678475931287
Validation loss = 0.000994379515759647
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -42.2    |
| Iteration     | 34       |
| MaximumReturn | -0.0177  |
| MinimumReturn | -82.5    |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0021470605861395597
Validation loss = 0.0008462891564704478
Validation loss = 0.0009590551489964128
Validation loss = 0.0010584123665466905
Validation loss = 0.0014889329904690385
Validation loss = 0.0010125909466296434
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0018998448504135013
Validation loss = 0.001154314260929823
Validation loss = 0.0009922777535393834
Validation loss = 0.0016327222110703588
Validation loss = 0.0011972623178735375
Validation loss = 0.0016412846744060516
Validation loss = 0.0009702838724479079
Validation loss = 0.0013859312748536468
Validation loss = 0.0007878102478571236
Validation loss = 0.0013186740688979626
Validation loss = 0.0010564924450591207
Validation loss = 0.001211552764289081
Validation loss = 0.001253336202353239
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017716356087476015
Validation loss = 0.0010383558692410588
Validation loss = 0.0009973574196919799
Validation loss = 0.000702602497767657
Validation loss = 0.001346808159723878
Validation loss = 0.0017768378602340817
Validation loss = 0.0008634370169602334
Validation loss = 0.0017383727245032787
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0024132607504725456
Validation loss = 0.0008596099796704948
Validation loss = 0.0010658538667485118
Validation loss = 0.001034810789860785
Validation loss = 0.0010295019019395113
Validation loss = 0.0016438915627077222
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0015605723019689322
Validation loss = 0.0007505139801651239
Validation loss = 0.0006430894718505442
Validation loss = 0.0008850812446326017
Validation loss = 0.0013839997118338943
Validation loss = 0.0013145803241059184
Validation loss = 0.0006949465023353696
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -6.42     |
| Iteration     | 35        |
| MaximumReturn | -0.000718 |
| MinimumReturn | -82.3     |
| TotalSamples  | 61642     |
-----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0017864853143692017
Validation loss = 0.0011063191341236234
Validation loss = 0.00087971321772784
Validation loss = 0.0007792225806042552
Validation loss = 0.001030478859320283
Validation loss = 0.0024522950407117605
Validation loss = 0.001136094331741333
Validation loss = 0.0011366910766810179
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0011281860060989857
Validation loss = 0.0019260153640061617
Validation loss = 0.0008300328627228737
Validation loss = 0.0017917925724759698
Validation loss = 0.0008297896711155772
Validation loss = 0.0007369788945652544
Validation loss = 0.0011607530759647489
Validation loss = 0.0010624813148751855
Validation loss = 0.000980118289589882
Validation loss = 0.0014414825709536672
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010969385039061308
Validation loss = 0.0008183136233128607
Validation loss = 0.0010980855440720916
Validation loss = 0.002608419628813863
Validation loss = 0.0007838309975340962
Validation loss = 0.0013908174587413669
Validation loss = 0.001019893097691238
Validation loss = 0.0013712940271943808
Validation loss = 0.0009576606098562479
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008860763045959175
Validation loss = 0.0009217924089170992
Validation loss = 0.0013235581573098898
Validation loss = 0.00120563677046448
Validation loss = 0.001478643505834043
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010938509367406368
Validation loss = 0.0025066116359084845
Validation loss = 0.0008686581277288496
Validation loss = 0.0006981612532399595
Validation loss = 0.0011316629825159907
Validation loss = 0.0008436479838564992
Validation loss = 0.0007323718164116144
Validation loss = 0.0010294526582583785
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -75.6    |
| Iteration     | 36       |
| MaximumReturn | -3.52    |
| MinimumReturn | -106     |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001850123517215252
Validation loss = 0.0006557811284437776
Validation loss = 0.0006263141985982656
Validation loss = 0.0009428415214642882
Validation loss = 0.0009073175024241209
Validation loss = 0.0015116524882614613
Validation loss = 0.001011338783428073
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.001952362130396068
Validation loss = 0.00098039663862437
Validation loss = 0.0008348057162947953
Validation loss = 0.0006675304030068219
Validation loss = 0.000678999989759177
Validation loss = 0.001057287910953164
Validation loss = 0.0007692190702073276
Validation loss = 0.0008903132984414697
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0031187008135020733
Validation loss = 0.0011238191509619355
Validation loss = 0.0006284923874773085
Validation loss = 0.0006135049043223262
Validation loss = 0.0006779326358810067
Validation loss = 0.000556624960154295
Validation loss = 0.0007596915820613503
Validation loss = 0.0007818732410669327
Validation loss = 0.0006278568180277944
Validation loss = 0.0010676532983779907
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0024963843170553446
Validation loss = 0.0008142123697325587
Validation loss = 0.000648113084025681
Validation loss = 0.001220092293806374
Validation loss = 0.0007994974148459733
Validation loss = 0.0009735168423503637
Validation loss = 0.0007243531290441751
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.002324172528460622
Validation loss = 0.0007745441980659962
Validation loss = 0.0015828714240342379
Validation loss = 0.0006568889948539436
Validation loss = 0.0008026580326259136
Validation loss = 0.0006553566199727356
Validation loss = 0.001228224253281951
Validation loss = 0.000952187750954181
Validation loss = 0.0008278526365756989
Validation loss = 0.0006728567532263696
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -92.7    |
| Iteration     | 37       |
| MaximumReturn | -40.4    |
| MinimumReturn | -119     |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00342115992680192
Validation loss = 0.0009212829172611237
Validation loss = 0.0006732956971973181
Validation loss = 0.0008106897585093975
Validation loss = 0.0010047275573015213
Validation loss = 0.0005701858317479491
Validation loss = 0.0007733585080131888
Validation loss = 0.000698340474627912
Validation loss = 0.0007843946223147213
Validation loss = 0.0012753801420331001
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0020904256962239742
Validation loss = 0.0007347584469243884
Validation loss = 0.000706805381923914
Validation loss = 0.0008270473917946219
Validation loss = 0.00074887799564749
Validation loss = 0.0009263904066756368
Validation loss = 0.0010219012619927526
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.003021255135536194
Validation loss = 0.0007668806938454509
Validation loss = 0.0006263392278924584
Validation loss = 0.0005695137660950422
Validation loss = 0.0012090422678738832
Validation loss = 0.0009896581759676337
Validation loss = 0.0006553546409122646
Validation loss = 0.0007856826996430755
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.003434482030570507
Validation loss = 0.0006882420857436955
Validation loss = 0.0011606805492192507
Validation loss = 0.0019267795141786337
Validation loss = 0.0006771326879970729
Validation loss = 0.0008229874074459076
Validation loss = 0.0012006604811176658
Validation loss = 0.00122141744941473
Validation loss = 0.0006614835583604872
Validation loss = 0.0008798738126643002
Validation loss = 0.000855403661262244
Validation loss = 0.0008161535952240229
Validation loss = 0.0006107031949795783
Validation loss = 0.00135678774677217
Validation loss = 0.0009240710642188787
Validation loss = 0.0007522590458393097
Validation loss = 0.0007120388327166438
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.003065949073061347
Validation loss = 0.0007370030507445335
Validation loss = 0.0007982349488884211
Validation loss = 0.0007212416967377067
Validation loss = 0.0007137794164009392
Validation loss = 0.000931215297896415
Validation loss = 0.0006201522191986442
Validation loss = 0.0009343603160232306
Validation loss = 0.0007764772162772715
Validation loss = 0.0008339377818629146
Validation loss = 0.0008438231307081878
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -13.5    |
| Iteration     | 38       |
| MaximumReturn | -0.303   |
| MinimumReturn | -53.2    |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0012770781759172678
Validation loss = 0.0010728839552029967
Validation loss = 0.0005919247050769627
Validation loss = 0.0007062184158712626
Validation loss = 0.0009608411928638816
Validation loss = 0.000657311175018549
Validation loss = 0.0005346548859961331
Validation loss = 0.0005846524145454168
Validation loss = 0.0010234620422124863
Validation loss = 0.0008804570534266531
Validation loss = 0.0009726604330353439
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0012971960240975022
Validation loss = 0.0009499360457994044
Validation loss = 0.0007265215390361845
Validation loss = 0.0007371767424046993
Validation loss = 0.0008233025437220931
Validation loss = 0.0008664483320899308
Validation loss = 0.0007125769625417888
Validation loss = 0.0008628955110907555
Validation loss = 0.0008343575755134225
Validation loss = 0.0009411490173079073
Validation loss = 0.00078853644663468
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010118725476786494
Validation loss = 0.0009654087480157614
Validation loss = 0.0008589149219915271
Validation loss = 0.0006495382403954864
Validation loss = 0.000658777542412281
Validation loss = 0.0011024861596524715
Validation loss = 0.0007903032819740474
Validation loss = 0.0007556126802228391
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001241314341314137
Validation loss = 0.0008225752972066402
Validation loss = 0.0007223004358820617
Validation loss = 0.0009439561981707811
Validation loss = 0.0005945804296061397
Validation loss = 0.001829426153562963
Validation loss = 0.0006793614593334496
Validation loss = 0.0007226828020066023
Validation loss = 0.0007646504091098905
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008058944949880242
Validation loss = 0.0009279281366616488
Validation loss = 0.0006373781361617148
Validation loss = 0.0007363418117165565
Validation loss = 0.0008219657465815544
Validation loss = 0.0009443980525247753
Validation loss = 0.0008003679104149342
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3.02     |
| Iteration     | 39        |
| MaximumReturn | -0.000737 |
| MinimumReturn | -75       |
| TotalSamples  | 68306     |
-----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007411609403789043
Validation loss = 0.0008822869276627898
Validation loss = 0.0006094531854614615
Validation loss = 0.0011849843431264162
Validation loss = 0.0009761810069903731
Validation loss = 0.0007034391746856272
Validation loss = 0.0005978696863166988
Validation loss = 0.0009324784041382372
Validation loss = 0.0006567769451066852
Validation loss = 0.001116874860599637
Validation loss = 0.000808697659522295
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008981373393908143
Validation loss = 0.0008284372161142528
Validation loss = 0.0010223108110949397
Validation loss = 0.00102128554135561
Validation loss = 0.0007779684965498745
Validation loss = 0.0007656189845874906
Validation loss = 0.0006515515851788223
Validation loss = 0.0007398236193694174
Validation loss = 0.0009130454855039716
Validation loss = 0.0008038993692025542
Validation loss = 0.0008266721270047128
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0010788504732772708
Validation loss = 0.0007074961322359741
Validation loss = 0.0010493079898878932
Validation loss = 0.0011560838902369142
Validation loss = 0.0009068086510524154
Validation loss = 0.0006916748243384063
Validation loss = 0.0009933323599398136
Validation loss = 0.0007023746147751808
Validation loss = 0.0009678979986347258
Validation loss = 0.0015248225536197424
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006421881844289601
Validation loss = 0.000514865096192807
Validation loss = 0.0008885354036465287
Validation loss = 0.0009399958071298897
Validation loss = 0.0005445396527647972
Validation loss = 0.0006439178250730038
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008652278920635581
Validation loss = 0.0007433214923366904
Validation loss = 0.0008629443473182619
Validation loss = 0.0008821228984743357
Validation loss = 0.0006354419165290892
Validation loss = 0.0006071827374398708
Validation loss = 0.0009039056021720171
Validation loss = 0.0008053734200075269
Validation loss = 0.0006916336133144796
Validation loss = 0.0005151931545697153
Validation loss = 0.0005772236036136746
Validation loss = 0.0009512646938674152
Validation loss = 0.0007773786783218384
Validation loss = 0.0007856679148972034
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.9    |
| Iteration     | 40       |
| MaximumReturn | -0.00162 |
| MinimumReturn | -52.1    |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000784912146627903
Validation loss = 0.0017989788902923465
Validation loss = 0.0018179717008024454
Validation loss = 0.0007591719622723758
Validation loss = 0.0008615898550488055
Validation loss = 0.000814553874079138
Validation loss = 0.0009871885413303971
Validation loss = 0.000761393632274121
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007159491651691496
Validation loss = 0.0006569060496985912
Validation loss = 0.0009621288045309484
Validation loss = 0.0007553958566859365
Validation loss = 0.0007384326308965683
Validation loss = 0.0008156756521202624
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009331402834504843
Validation loss = 0.0006559382891282439
Validation loss = 0.001474403659813106
Validation loss = 0.0014039395609870553
Validation loss = 0.00047777328290976584
Validation loss = 0.0010426996741443872
Validation loss = 0.0013013093266636133
Validation loss = 0.0013707377947866917
Validation loss = 0.0006400502170436084
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0016168593429028988
Validation loss = 0.0011541224084794521
Validation loss = 0.0006145780207589269
Validation loss = 0.0009637960465624928
Validation loss = 0.0010312923695892096
Validation loss = 0.0011024957057088614
Validation loss = 0.0007684719748795033
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0008125141030177474
Validation loss = 0.0006785166915506124
Validation loss = 0.0008629609365016222
Validation loss = 0.0006425042520277202
Validation loss = 0.0010395997669547796
Validation loss = 0.001262927777133882
Validation loss = 0.0011811776785179973
Validation loss = 0.0010179218370467424
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0245   |
| Iteration     | 41        |
| MaximumReturn | -0.000765 |
| MinimumReturn | -0.268    |
| TotalSamples  | 71638     |
-----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006801866693422198
Validation loss = 0.0009220523643307388
Validation loss = 0.0012254579924046993
Validation loss = 0.0009295549825765193
Validation loss = 0.0006139262113720179
Validation loss = 0.0006323932320810854
Validation loss = 0.0018071886152029037
Validation loss = 0.0006232081796042621
Validation loss = 0.0006735702627338469
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0008645066991448402
Validation loss = 0.0007833341369405389
Validation loss = 0.0009095301502384245
Validation loss = 0.0011697070440277457
Validation loss = 0.0010660778498277068
Validation loss = 0.0005756696918979287
Validation loss = 0.0006211278960108757
Validation loss = 0.0008494366775266826
Validation loss = 0.000907974666915834
Validation loss = 0.0006278310320340097
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006840176065452397
Validation loss = 0.0007181651308201253
Validation loss = 0.0011626629857346416
Validation loss = 0.0007877095486037433
Validation loss = 0.0006038509891368449
Validation loss = 0.00047342845937237144
Validation loss = 0.0007635755464434624
Validation loss = 0.00105107796844095
Validation loss = 0.0008317177416756749
Validation loss = 0.0010956861078739166
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0011239820159971714
Validation loss = 0.0009488426148891449
Validation loss = 0.0006172293797135353
Validation loss = 0.003349079517647624
Validation loss = 0.0008248960366472602
Validation loss = 0.0006799741531722248
Validation loss = 0.0013080312637612224
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0010075934696942568
Validation loss = 0.0007239969563670456
Validation loss = 0.0005503009888343513
Validation loss = 0.0009168976102955639
Validation loss = 0.000794931489508599
Validation loss = 0.0007204830180853605
Validation loss = 0.000796404667198658
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.00078  |
| Iteration     | 42        |
| MaximumReturn | -0.000586 |
| MinimumReturn | -0.000964 |
| TotalSamples  | 73304     |
-----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.001298127812333405
Validation loss = 0.0012649671407416463
Validation loss = 0.001142840483225882
Validation loss = 0.0008264483767561615
Validation loss = 0.0010514913592487574
Validation loss = 0.0009951074607670307
Validation loss = 0.0007969646831043065
Validation loss = 0.0007770418887957931
Validation loss = 0.0007048679981380701
Validation loss = 0.0007980954251252115
Validation loss = 0.000594435608945787
Validation loss = 0.0007712052320130169
Validation loss = 0.0005599955329671502
Validation loss = 0.0007312154048122466
Validation loss = 0.0005554974195547402
Validation loss = 0.0005066918674856424
Validation loss = 0.0006518491427414119
Validation loss = 0.0007959060021676123
Validation loss = 0.0009216966573148966
Validation loss = 0.0005996861727908254
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007982824463397264
Validation loss = 0.0009482651948928833
Validation loss = 0.0011958653340116143
Validation loss = 0.0006992045091465116
Validation loss = 0.0005419596564024687
Validation loss = 0.0010649197502061725
Validation loss = 0.0012529471423476934
Validation loss = 0.0010437017772346735
Validation loss = 0.0006637516198679805
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008339250925928354
Validation loss = 0.0007861294434405863
Validation loss = 0.0009079723386093974
Validation loss = 0.0013219048269093037
Validation loss = 0.0009531522518955171
Validation loss = 0.0008805254474282265
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006945405621081591
Validation loss = 0.0012028147466480732
Validation loss = 0.0009362401906400919
Validation loss = 0.0007393051637336612
Validation loss = 0.0008975432720035315
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0013597027864307165
Validation loss = 0.0006226917030289769
Validation loss = 0.000809497432783246
Validation loss = 0.0008354412275366485
Validation loss = 0.0009526985813863575
Validation loss = 0.0007123884279280901
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -72.8    |
| Iteration     | 43       |
| MaximumReturn | -1.1     |
| MinimumReturn | -190     |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0016823395853862166
Validation loss = 0.0005904181161895394
Validation loss = 0.0005293183494359255
Validation loss = 0.0007485674577765167
Validation loss = 0.001068156328983605
Validation loss = 0.0004334939003456384
Validation loss = 0.0004970373120158911
Validation loss = 0.000903539767023176
Validation loss = 0.0005243031773716211
Validation loss = 0.00046460566227324307
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010731177171692252
Validation loss = 0.0007495674071833491
Validation loss = 0.0006436228286474943
Validation loss = 0.0006391837378032506
Validation loss = 0.00047308654757216573
Validation loss = 0.0007421577465720475
Validation loss = 0.0005372681771405041
Validation loss = 0.0009087259531952441
Validation loss = 0.0007483272929675877
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0012643610825762153
Validation loss = 0.0007370954845100641
Validation loss = 0.0007088842103257775
Validation loss = 0.000719401694368571
Validation loss = 0.0005418580840341747
Validation loss = 0.0009784386493265629
Validation loss = 0.0006425625761039555
Validation loss = 0.0010345284827053547
Validation loss = 0.0014337538741528988
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.001955672400072217
Validation loss = 0.0008674012497067451
Validation loss = 0.0006154341390356421
Validation loss = 0.0006616800674237311
Validation loss = 0.0006088835070841014
Validation loss = 0.0009610297274775803
Validation loss = 0.0006135526928119361
Validation loss = 0.0008488932508043945
Validation loss = 0.000633229676168412
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.001092303078621626
Validation loss = 0.0005703052156604826
Validation loss = 0.0005819642683491111
Validation loss = 0.0005249844398349524
Validation loss = 0.0008874860941432416
Validation loss = 0.0003932300896849483
Validation loss = 0.0004342035681474954
Validation loss = 0.000543502508662641
Validation loss = 0.0005295486189424992
Validation loss = 0.0006501050665974617
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.28    |
| Iteration     | 44       |
| MaximumReturn | -0.783   |
| MinimumReturn | -1.71    |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0003809351474046707
Validation loss = 0.0005529799382202327
Validation loss = 0.0005407813005149364
Validation loss = 0.0003917414287570864
Validation loss = 0.0009804613655433059
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00046171125723049045
Validation loss = 0.0008710288675501943
Validation loss = 0.0007789157680235803
Validation loss = 0.0006166413077153265
Validation loss = 0.0015915876720100641
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006328693707473576
Validation loss = 0.00044091889867559075
Validation loss = 0.00047060841461643577
Validation loss = 0.0005055395304225385
Validation loss = 0.0005856704083271325
Validation loss = 0.0006001787260174751
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005120197893120348
Validation loss = 0.0005320578929968178
Validation loss = 0.0006100020254962146
Validation loss = 0.0006894008256494999
Validation loss = 0.0005346031975932419
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005699124303646386
Validation loss = 0.0007308688363991678
Validation loss = 0.0005130244535394013
Validation loss = 0.0005216493736952543
Validation loss = 0.0005284692742861807
Validation loss = 0.001056389301083982
Validation loss = 0.0006637829937972128
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.641   |
| Iteration     | 45       |
| MaximumReturn | -0.171   |
| MinimumReturn | -0.96    |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00043861495214514434
Validation loss = 0.0005602227174676955
Validation loss = 0.0012254909379407763
Validation loss = 0.00045742691145278513
Validation loss = 0.0006457588169723749
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000505810952745378
Validation loss = 0.0005803777603432536
Validation loss = 0.0005492654745467007
Validation loss = 0.0006415334646590054
Validation loss = 0.0004798525769729167
Validation loss = 0.0004613400960806757
Validation loss = 0.0005770624848082662
Validation loss = 0.0005059062968939543
Validation loss = 0.0006433632224798203
Validation loss = 0.0005902945995330811
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00041739331209100783
Validation loss = 0.0009081059251911938
Validation loss = 0.0006723599508404732
Validation loss = 0.00039749228744767606
Validation loss = 0.0007755248225294054
Validation loss = 0.0006327168084681034
Validation loss = 0.0005804132670164108
Validation loss = 0.0005359690985642374
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005337994080036879
Validation loss = 0.0004875632585026324
Validation loss = 0.0005433881306089461
Validation loss = 0.0005657745059579611
Validation loss = 0.0005340335192158818
Validation loss = 0.0007276547257788479
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004317224957048893
Validation loss = 0.0004676785902120173
Validation loss = 0.000595314777456224
Validation loss = 0.0005560250137932599
Validation loss = 0.0005584711907431483
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.635   |
| Iteration     | 46       |
| MaximumReturn | -0.405   |
| MinimumReturn | -0.938   |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00034626605338416994
Validation loss = 0.0004997570067644119
Validation loss = 0.0005513634532690048
Validation loss = 0.0008830221486277878
Validation loss = 0.0012112485710531473
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005751125863753259
Validation loss = 0.0007221923442557454
Validation loss = 0.0006122913327999413
Validation loss = 0.0004161841352470219
Validation loss = 0.0007079230854287744
Validation loss = 0.0004014834703411907
Validation loss = 0.0007038252661004663
Validation loss = 0.0004639570543076843
Validation loss = 0.0006225507822819054
Validation loss = 0.0004445974773261696
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0009638755582273006
Validation loss = 0.0005425691488198936
Validation loss = 0.0005116576794534922
Validation loss = 0.00077090720878914
Validation loss = 0.0005424236878752708
Validation loss = 0.0005976580432616174
Validation loss = 0.0005389591678977013
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007566158892586827
Validation loss = 0.0005408286233432591
Validation loss = 0.0008538604015484452
Validation loss = 0.000806861266028136
Validation loss = 0.0007198405219241977
Validation loss = 0.0005880278185941279
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0003491537063382566
Validation loss = 0.0004339214356150478
Validation loss = 0.0009080523741431534
Validation loss = 0.00038028403650969267
Validation loss = 0.0004238341934978962
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.438   |
| Iteration     | 47       |
| MaximumReturn | -0.00314 |
| MinimumReturn | -1.05    |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006097193108871579
Validation loss = 0.00040336005622521043
Validation loss = 0.0005475092912092805
Validation loss = 0.0003967932425439358
Validation loss = 0.0004698334669228643
Validation loss = 0.0008709030225872993
Validation loss = 0.000676961150020361
Validation loss = 0.0005745558300986886
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005810523289255798
Validation loss = 0.00042253470746800303
Validation loss = 0.00047765611088834703
Validation loss = 0.0006797049427405
Validation loss = 0.0005386628909036517
Validation loss = 0.0005960926064290106
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00042390794260427356
Validation loss = 0.0005486566224135458
Validation loss = 0.00045547710033133626
Validation loss = 0.0005805604159832001
Validation loss = 0.0006956380675546825
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005912608467042446
Validation loss = 0.0006171319400891662
Validation loss = 0.0003935088752768934
Validation loss = 0.0005106242606416345
Validation loss = 0.0004896632162854075
Validation loss = 0.0004596115031745285
Validation loss = 0.0007422942435368896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005129422643221915
Validation loss = 0.00046168771223165095
Validation loss = 0.00038835080340504646
Validation loss = 0.00040404018363915384
Validation loss = 0.0009887361666187644
Validation loss = 0.000761045899707824
Validation loss = 0.0006225125980563462
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0135   |
| Iteration     | 48        |
| MaximumReturn | -0.000857 |
| MinimumReturn | -0.31     |
| TotalSamples  | 83300     |
-----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007016955642029643
Validation loss = 0.000601168954744935
Validation loss = 0.0003788010508287698
Validation loss = 0.0007349271909333766
Validation loss = 0.0005915857036598027
Validation loss = 0.0005870878230780363
Validation loss = 0.0005081466515548527
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005782321095466614
Validation loss = 0.0003877707931678742
Validation loss = 0.0008600463042967021
Validation loss = 0.0005662829498760402
Validation loss = 0.0003798940742854029
Validation loss = 0.0004898448823951185
Validation loss = 0.00041167414747178555
Validation loss = 0.0005534695228561759
Validation loss = 0.00042873143684118986
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005945761222392321
Validation loss = 0.0009574475116096437
Validation loss = 0.0006617971230298281
Validation loss = 0.0006146736559458077
Validation loss = 0.0005918585229665041
Validation loss = 0.000704676378518343
Validation loss = 0.0003352571511641145
Validation loss = 0.0005053218919783831
Validation loss = 0.0003763380809687078
Validation loss = 0.0008965572342276573
Validation loss = 0.0005677618319168687
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005567101761698723
Validation loss = 0.0004629290197044611
Validation loss = 0.0009541556937620044
Validation loss = 0.0005437055951915681
Validation loss = 0.000610656919889152
Validation loss = 0.0004131146997679025
Validation loss = 0.0005787950358353555
Validation loss = 0.00047150033060461283
Validation loss = 0.0007240761769935489
Validation loss = 0.00045963528100401163
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004920236533507705
Validation loss = 0.0008899981039576232
Validation loss = 0.0006547606317326427
Validation loss = 0.0006306704599410295
Validation loss = 0.0004870526900049299
Validation loss = 0.0008022376568987966
Validation loss = 0.0006518633454106748
Validation loss = 0.0006696002674289048
Validation loss = 0.00034217507345601916
Validation loss = 0.0005678709130734205
Validation loss = 0.0005954079097136855
Validation loss = 0.0005138622946105897
Validation loss = 0.0003107127267867327
Validation loss = 0.0006549748359248042
Validation loss = 0.0007266331813298166
Validation loss = 0.0004041042411699891
Validation loss = 0.00039056825335137546
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -2.89    |
| Iteration     | 49       |
| MaximumReturn | -0.561   |
| MinimumReturn | -48.3    |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007492420263588428
Validation loss = 0.0002959039411507547
Validation loss = 0.000656491203699261
Validation loss = 0.0003139389446005225
Validation loss = 0.00036810850724577904
Validation loss = 0.0004703149606939405
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00043756116065196693
Validation loss = 0.00046168165863491595
Validation loss = 0.0005031901528127491
Validation loss = 0.0003724190464708954
Validation loss = 0.00043801855645142496
Validation loss = 0.0005310630658641458
Validation loss = 0.00034020200837403536
Validation loss = 0.000577841536141932
Validation loss = 0.0007284425082616508
Validation loss = 0.0004628892638720572
Validation loss = 0.0005509377806447446
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008234378765337169
Validation loss = 0.0005463759880512953
Validation loss = 0.000395672075683251
Validation loss = 0.00041787675581872463
Validation loss = 0.0005061236442998052
Validation loss = 0.000578755047172308
Validation loss = 0.00035408776602707803
Validation loss = 0.0005011592875234783
Validation loss = 0.00039056589594110847
Validation loss = 0.0003567903768271208
Validation loss = 0.00031555196619592607
Validation loss = 0.00045568819041363895
Validation loss = 0.0004979540244676173
Validation loss = 0.0003276059287600219
Validation loss = 0.0008366702822968364
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00031905443756841123
Validation loss = 0.00046963183558546007
Validation loss = 0.0005148365162312984
Validation loss = 0.0005059808026999235
Validation loss = 0.00043102685594931245
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004660754930227995
Validation loss = 0.00039329170249402523
Validation loss = 0.0007172268233262002
Validation loss = 0.00045514307566918433
Validation loss = 0.0003001236473210156
Validation loss = 0.0005652785766869783
Validation loss = 0.000383601087378338
Validation loss = 0.00035330659011378884
Validation loss = 0.0005415010382421315
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.27     |
| Iteration     | 50        |
| MaximumReturn | -0.000733 |
| MinimumReturn | -27.1     |
| TotalSamples  | 86632     |
-----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005054791690781713
Validation loss = 0.00036653850111179054
Validation loss = 0.00041474419413134456
Validation loss = 0.0002969367487821728
Validation loss = 0.0003837058902718127
Validation loss = 0.0005486994050443172
Validation loss = 0.0003952819388359785
Validation loss = 0.0005026428261771798
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00042532754014246166
Validation loss = 0.0007521677180193365
Validation loss = 0.0003733225166797638
Validation loss = 0.000544342037755996
Validation loss = 0.0004950262955389917
Validation loss = 0.0006629000417888165
Validation loss = 0.0004573770274873823
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00036468604230321944
Validation loss = 0.0004990787710994482
Validation loss = 0.0004089784051757306
Validation loss = 0.00033879585680551827
Validation loss = 0.0003493684343993664
Validation loss = 0.0005977179971523583
Validation loss = 0.0005910118925385177
Validation loss = 0.00045510896597988904
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0004042811633553356
Validation loss = 0.0007231562631204724
Validation loss = 0.00030469719786196947
Validation loss = 0.0003985390067100525
Validation loss = 0.0004987968713976443
Validation loss = 0.0003107055090367794
Validation loss = 0.00036460827686823905
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0003200258652213961
Validation loss = 0.0005657460424117744
Validation loss = 0.00043917426955886185
Validation loss = 0.0004767164646182209
Validation loss = 0.0006073998520150781
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -12.8    |
| Iteration     | 51       |
| MaximumReturn | -0.138   |
| MinimumReturn | -88.8    |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007507345289923251
Validation loss = 0.0003998284228146076
Validation loss = 0.00039520979044027627
Validation loss = 0.00035814466536976397
Validation loss = 0.0004299589490983635
Validation loss = 0.0005793583113700151
Validation loss = 0.0004384091298561543
Validation loss = 0.0005676639848388731
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0016120151849463582
Validation loss = 0.0005106509197503328
Validation loss = 0.0006020672153681517
Validation loss = 0.0007507262635044754
Validation loss = 0.0005065797013230622
Validation loss = 0.0005403627292253077
Validation loss = 0.0003908045473508537
Validation loss = 0.0004865463124588132
Validation loss = 0.0005373565363697708
Validation loss = 0.00044922108645550907
Validation loss = 0.00044898290070705116
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007738273125141859
Validation loss = 0.00037821923615410924
Validation loss = 0.0005123118171468377
Validation loss = 0.0004795163113158196
Validation loss = 0.00048661595792509615
Validation loss = 0.00036468610051088035
Validation loss = 0.00046390105853788555
Validation loss = 0.0011603885795921087
Validation loss = 0.0003303975099697709
Validation loss = 0.0004469806153792888
Validation loss = 0.0005868145963177085
Validation loss = 0.00041045344551093876
Validation loss = 0.0004268987977411598
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0009477907442487776
Validation loss = 0.0004373016708996147
Validation loss = 0.00061568315140903
Validation loss = 0.00048151816008612514
Validation loss = 0.000580432009883225
Validation loss = 0.00034693029010668397
Validation loss = 0.0004084863467141986
Validation loss = 0.0003825054445769638
Validation loss = 0.00044420489575713873
Validation loss = 0.00039671416743658483
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007550838054157794
Validation loss = 0.0005831085727550089
Validation loss = 0.00038511891034431756
Validation loss = 0.0003274123009759933
Validation loss = 0.0007040394702926278
Validation loss = 0.0006176012684591115
Validation loss = 0.0005542500293813646
Validation loss = 0.0003729487652890384
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.884    |
| Iteration     | 52        |
| MaximumReturn | -0.000896 |
| MinimumReturn | -19.1     |
| TotalSamples  | 89964     |
-----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005725000519305468
Validation loss = 0.0005990398349240422
Validation loss = 0.00038697171839885414
Validation loss = 0.0004085399559698999
Validation loss = 0.0006377630634233356
Validation loss = 0.00042488123290240765
Validation loss = 0.0004644605505745858
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004134606570005417
Validation loss = 0.0003904695331584662
Validation loss = 0.0004739348660223186
Validation loss = 0.0004254801315255463
Validation loss = 0.00046476299758069217
Validation loss = 0.0008239583694376051
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0003511133254505694
Validation loss = 0.0005893068737350404
Validation loss = 0.00039668340468779206
Validation loss = 0.0004429333785083145
Validation loss = 0.00039278811891563237
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007372902473434806
Validation loss = 0.0004234430380165577
Validation loss = 0.0004844711802434176
Validation loss = 0.00048319014604203403
Validation loss = 0.0004764882614836097
Validation loss = 0.0005327780963853002
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00043492086115293205
Validation loss = 0.00046841782750561833
Validation loss = 0.0005532222567126155
Validation loss = 0.0004425973165780306
Validation loss = 0.0004579534870572388
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.24     |
| Iteration     | 53        |
| MaximumReturn | -0.000646 |
| MinimumReturn | -28       |
| TotalSamples  | 91630     |
-----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.000914194097276777
Validation loss = 0.0006738948286511004
Validation loss = 0.0007212628843262792
Validation loss = 0.0005256378790363669
Validation loss = 0.0004463639634195715
Validation loss = 0.0007809902308508754
Validation loss = 0.00042424912680871785
Validation loss = 0.0003569054533727467
Validation loss = 0.0004232611390762031
Validation loss = 0.0004401273326948285
Validation loss = 0.0003510143142193556
Validation loss = 0.0005927272723056376
Validation loss = 0.00048591403174214065
Validation loss = 0.000530846999026835
Validation loss = 0.00045918417163193226
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006834425148554146
Validation loss = 0.0005038617528043687
Validation loss = 0.0005668313824571669
Validation loss = 0.000615488039329648
Validation loss = 0.000607444264460355
Validation loss = 0.0008576432010158896
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00035665882751345634
Validation loss = 0.0005588068161159754
Validation loss = 0.00045120075810700655
Validation loss = 0.00047889919369481504
Validation loss = 0.00043343211291357875
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005413317703641951
Validation loss = 0.00045731005957350135
Validation loss = 0.00043228358845226467
Validation loss = 0.0005817323690280318
Validation loss = 0.0005244122003205121
Validation loss = 0.0006810460472479463
Validation loss = 0.0006441353471018374
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00042286640382371843
Validation loss = 0.0008851838065311313
Validation loss = 0.0004169782914686948
Validation loss = 0.00038555351784452796
Validation loss = 0.0003519663878250867
Validation loss = 0.00041542554390616715
Validation loss = 0.00045167290954850614
Validation loss = 0.0004400086763780564
Validation loss = 0.0005654083215631545
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -47.4    |
| Iteration     | 54       |
| MaximumReturn | -2.38    |
| MinimumReturn | -150     |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004949066205881536
Validation loss = 0.0006615396705456078
Validation loss = 0.00034732732456177473
Validation loss = 0.0004692313668783754
Validation loss = 0.00037747100577689707
Validation loss = 0.0018468693597242236
Validation loss = 0.0007021625642664731
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007842485210858285
Validation loss = 0.0003815616946667433
Validation loss = 0.0007186874863691628
Validation loss = 0.0006537410081364214
Validation loss = 0.00047732621897011995
Validation loss = 0.0003895880945492536
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0017282270127907395
Validation loss = 0.0004578112857416272
Validation loss = 0.0006141022895462811
Validation loss = 0.0005419827066361904
Validation loss = 0.0006528683588840067
Validation loss = 0.0004870942502748221
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0005864630802534521
Validation loss = 0.00034409677027724683
Validation loss = 0.0003782572166528553
Validation loss = 0.00042308546835556626
Validation loss = 0.0005034818896092474
Validation loss = 0.0005914884386584163
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007263619336299598
Validation loss = 0.0037975881714373827
Validation loss = 0.0004046381509397179
Validation loss = 0.00032838029437698424
Validation loss = 0.00041564347338862717
Validation loss = 0.0005233159754425287
Validation loss = 0.0003456638951320201
Validation loss = 0.00042610210948623717
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -47      |
| Iteration     | 55       |
| MaximumReturn | -1.37    |
| MinimumReturn | -135     |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0003876082773786038
Validation loss = 0.00040416562114842236
Validation loss = 0.00042789854342117906
Validation loss = 0.0005630252999253571
Validation loss = 0.00041428187978453934
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005798847414553165
Validation loss = 0.00036238919710740447
Validation loss = 0.00043416640255600214
Validation loss = 0.0006132901762612164
Validation loss = 0.0007784242043271661
Validation loss = 0.0010904156370088458
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.001485380227677524
Validation loss = 0.00036063254810869694
Validation loss = 0.0007315301918424666
Validation loss = 0.0003578650066629052
Validation loss = 0.0005361487856134772
Validation loss = 0.00031436956487596035
Validation loss = 0.0003480222949292511
Validation loss = 0.0006999560282565653
Validation loss = 0.0003535376163199544
Validation loss = 0.00039598235161975026
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010094488970935345
Validation loss = 0.0006422774167731404
Validation loss = 0.0006199133349582553
Validation loss = 0.0006056472193449736
Validation loss = 0.00082075857790187
Validation loss = 0.0004802522889804095
Validation loss = 0.0006671804585494101
Validation loss = 0.00046833980013616383
Validation loss = 0.00037019504816271365
Validation loss = 0.0003600649652071297
Validation loss = 0.00039023946737870574
Validation loss = 0.0006193199078552425
Validation loss = 0.00039356487104669213
Validation loss = 0.0004908241098746657
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004146586579736322
Validation loss = 0.0005383935640566051
Validation loss = 0.00038354875869117677
Validation loss = 0.00043378121335990727
Validation loss = 0.00037051213439553976
Validation loss = 0.0005144524620845914
Validation loss = 0.0005775741883553565
Validation loss = 0.0004984121187590063
Validation loss = 0.0006131724803708494
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -20.8    |
| Iteration     | 56       |
| MaximumReturn | -1.12    |
| MinimumReturn | -126     |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00045672283158637583
Validation loss = 0.00041864244849421084
Validation loss = 0.0010117533383890986
Validation loss = 0.0003708230797201395
Validation loss = 0.0004495570028666407
Validation loss = 0.0004013486613985151
Validation loss = 0.0004821572219952941
Validation loss = 0.0003946831857319921
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0010285477619618177
Validation loss = 0.0006040871376171708
Validation loss = 0.0005191985401324928
Validation loss = 0.0004874862788710743
Validation loss = 0.00040810706559568644
Validation loss = 0.0008333537844009697
Validation loss = 0.000571093987673521
Validation loss = 0.00031665689311921597
Validation loss = 0.00042200658936053514
Validation loss = 0.0004195758665446192
Validation loss = 0.00047717313282191753
Validation loss = 0.000446859747171402
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00037836466799490154
Validation loss = 0.0007493121665902436
Validation loss = 0.0006385287852026522
Validation loss = 0.00043880162411369383
Validation loss = 0.0003461028391029686
Validation loss = 0.00045299940393306315
Validation loss = 0.0003159306652378291
Validation loss = 0.0004283142916392535
Validation loss = 0.0004098128993064165
Validation loss = 0.00046697258949279785
Validation loss = 0.0004087558190803975
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006024951580911875
Validation loss = 0.0003319347742944956
Validation loss = 0.00044374054414220154
Validation loss = 0.0004324254987295717
Validation loss = 0.0003641083312686533
Validation loss = 0.0003892144013661891
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00048690257244743407
Validation loss = 0.0005845324485562742
Validation loss = 0.00043055566493421793
Validation loss = 0.00037192623130977154
Validation loss = 0.00036398274824023247
Validation loss = 0.00048680909094400704
Validation loss = 0.0003747262235265225
Validation loss = 0.0003962571208830923
Validation loss = 0.0003983517235610634
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.00954 |
| Iteration     | 57       |
| MaximumReturn | -0.00058 |
| MinimumReturn | -0.216   |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00041185779264196754
Validation loss = 0.00047257140977308154
Validation loss = 0.0005217999569140375
Validation loss = 0.000544683076441288
Validation loss = 0.0005757045582868159
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005134194507263601
Validation loss = 0.0004953451571054757
Validation loss = 0.00037486368091776967
Validation loss = 0.0005278422613628209
Validation loss = 0.00031156628392636776
Validation loss = 0.0004150520544499159
Validation loss = 0.0009517953149043024
Validation loss = 0.00043069294770248234
Validation loss = 0.0006070962990634143
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00041175223304890096
Validation loss = 0.00039924393058754504
Validation loss = 0.000380439538275823
Validation loss = 0.0004482413060031831
Validation loss = 0.0005270394030958414
Validation loss = 0.00044941992382518947
Validation loss = 0.0006514830747619271
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006890373188070953
Validation loss = 0.0005132983205839992
Validation loss = 0.0006258124485611916
Validation loss = 0.00038195677916519344
Validation loss = 0.0006962309707887471
Validation loss = 0.0003782684507314116
Validation loss = 0.0004262466391082853
Validation loss = 0.0003624869277700782
Validation loss = 0.0004214805085211992
Validation loss = 0.00046267174184322357
Validation loss = 0.0005192584358155727
Validation loss = 0.0002896933874581009
Validation loss = 0.00070751499151811
Validation loss = 0.0005033847410231829
Validation loss = 0.0006635002209804952
Validation loss = 0.0003756214282475412
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0003980725014116615
Validation loss = 0.0005597468698397279
Validation loss = 0.0004186913720332086
Validation loss = 0.00044327351497486234
Validation loss = 0.0007953204913064837
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.084    |
| Iteration     | 58        |
| MaximumReturn | -0.000573 |
| MinimumReturn | -2.08     |
| TotalSamples  | 99960     |
-----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004371550166979432
Validation loss = 0.00044878304470330477
Validation loss = 0.00044065117253921926
Validation loss = 0.000984728685580194
Validation loss = 0.0006577431922778487
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005323782097548246
Validation loss = 0.0006525953649543226
Validation loss = 0.00048571242950856686
Validation loss = 0.0003252045717090368
Validation loss = 0.0005124867893755436
Validation loss = 0.00038799725007265806
Validation loss = 0.0007006442756392062
Validation loss = 0.0004176985239610076
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006715087802149355
Validation loss = 0.0004446258826646954
Validation loss = 0.000701530952937901
Validation loss = 0.0004474493907764554
Validation loss = 0.0003612014988902956
Validation loss = 0.000694788119290024
Validation loss = 0.0003767079906538129
Validation loss = 0.00045795031473971903
Validation loss = 0.0007441669586114585
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0003877427370753139
Validation loss = 0.0008376064943149686
Validation loss = 0.0004844230134040117
Validation loss = 0.0010929788695648313
Validation loss = 0.00045262862113304436
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004011105338577181
Validation loss = 0.0006540063768625259
Validation loss = 0.0005587534396909177
Validation loss = 0.00035698243300430477
Validation loss = 0.0004199397808406502
Validation loss = 0.0005163453752174973
Validation loss = 0.00048533131484873593
Validation loss = 0.0004843877977691591
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.354    |
| Iteration     | 59        |
| MaximumReturn | -0.000635 |
| MinimumReturn | -2.04     |
| TotalSamples  | 101626    |
-----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004978643846698105
Validation loss = 0.00046765280421823263
Validation loss = 0.0003659471112769097
Validation loss = 0.00038371869595721364
Validation loss = 0.00043802321306429803
Validation loss = 0.0006663893000222743
Validation loss = 0.0005439735250547528
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00039394808118231595
Validation loss = 0.0008263791096396744
Validation loss = 0.0007385661592707038
Validation loss = 0.0004294788814149797
Validation loss = 0.00038770335959270597
Validation loss = 0.00030112641979940236
Validation loss = 0.0004897566395811737
Validation loss = 0.000488847668748349
Validation loss = 0.0004455270536709577
Validation loss = 0.0004682081053033471
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0008262529736384749
Validation loss = 0.0005973950610496104
Validation loss = 0.0005222848267294466
Validation loss = 0.0008860892266966403
Validation loss = 0.0004088417917955667
Validation loss = 0.00045117418630979955
Validation loss = 0.00046401546569541097
Validation loss = 0.0005053335335105658
Validation loss = 0.0008031540201045573
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000998693285509944
Validation loss = 0.0008356052567251027
Validation loss = 0.0005013521295040846
Validation loss = 0.00034741402487270534
Validation loss = 0.0004746801278088242
Validation loss = 0.001165392342954874
Validation loss = 0.00040528379031457007
Validation loss = 0.00042078085243701935
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004108489665668458
Validation loss = 0.0005700837355107069
Validation loss = 0.000441212032455951
Validation loss = 0.0003422663430683315
Validation loss = 0.00036157132126390934
Validation loss = 0.00039490312337875366
Validation loss = 0.00032056885538622737
Validation loss = 0.0005076120141893625
Validation loss = 0.0002944774751085788
Validation loss = 0.0002965548192150891
Validation loss = 0.00043127595563419163
Validation loss = 0.000725405290722847
Validation loss = 0.0003098167653661221
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.201    |
| Iteration     | 60        |
| MaximumReturn | -0.000591 |
| MinimumReturn | -2.59     |
| TotalSamples  | 103292    |
-----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00046112830750644207
Validation loss = 0.00043827694025821984
Validation loss = 0.0006679235957562923
Validation loss = 0.00045310184941627085
Validation loss = 0.000579951680265367
Validation loss = 0.000417048780946061
Validation loss = 0.0003919456503354013
Validation loss = 0.0004347574431449175
Validation loss = 0.0006287812138907611
Validation loss = 0.0005426577990874648
Validation loss = 0.0003944304189644754
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00044695063843391836
Validation loss = 0.00039776868652552366
Validation loss = 0.000582629581913352
Validation loss = 0.00047196028754115105
Validation loss = 0.000356459291651845
Validation loss = 0.00043814347009174526
Validation loss = 0.00043245850247330964
Validation loss = 0.0003942810872104019
Validation loss = 0.0005030632019042969
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006013612728565931
Validation loss = 0.00032381818164139986
Validation loss = 0.0006266745622269809
Validation loss = 0.0003374716325197369
Validation loss = 0.0005126355681568384
Validation loss = 0.0008700725738890469
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00033357570646330714
Validation loss = 0.0004763485922012478
Validation loss = 0.00038368228706531227
Validation loss = 0.0013809168012812734
Validation loss = 0.000807372503913939
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00036485306918621063
Validation loss = 0.0005019069067202508
Validation loss = 0.00033073441591113806
Validation loss = 0.00036478127003647387
Validation loss = 0.0012164919171482325
Validation loss = 0.0004874065052717924
Validation loss = 0.00037743410211987793
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0134   |
| Iteration     | 61        |
| MaximumReturn | -0.000673 |
| MinimumReturn | -0.316    |
| TotalSamples  | 104958    |
-----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007975393091328442
Validation loss = 0.00048050936311483383
Validation loss = 0.0003522007027640939
Validation loss = 0.0004507321282289922
Validation loss = 0.00035439120256341994
Validation loss = 0.0004809833480976522
Validation loss = 0.0007633549394086003
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0003955193387810141
Validation loss = 0.0003883177414536476
Validation loss = 0.0005693423445336521
Validation loss = 0.00037497736047953367
Validation loss = 0.0004656921373680234
Validation loss = 0.00032783360802568495
Validation loss = 0.00034249943564645946
Validation loss = 0.0003951548715122044
Validation loss = 0.0004970941226929426
Validation loss = 0.00036010597250424325
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0003722502733580768
Validation loss = 0.0003358983958605677
Validation loss = 0.0009291848400607705
Validation loss = 0.00046795184607617557
Validation loss = 0.0007022245554253459
Validation loss = 0.0004403635975904763
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006685638800263405
Validation loss = 0.0006866487092338502
Validation loss = 0.00046718440717086196
Validation loss = 0.0009258352220058441
Validation loss = 0.0006239371723495424
Validation loss = 0.00038360164035111666
Validation loss = 0.0006820813287049532
Validation loss = 0.0005544354789890349
Validation loss = 0.0004342910251580179
Validation loss = 0.00038916984340175986
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0009455007966607809
Validation loss = 0.0006526958895847201
Validation loss = 0.0005815998301841319
Validation loss = 0.0003695163468364626
Validation loss = 0.00041403918294236064
Validation loss = 0.0004107617132831365
Validation loss = 0.0004356520948931575
Validation loss = 0.00032137060770764947
Validation loss = 0.0005223274929448962
Validation loss = 0.0004298588028177619
Validation loss = 0.0003943585616070777
Validation loss = 0.0004017335013486445
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.392    |
| Iteration     | 62        |
| MaximumReturn | -0.000935 |
| MinimumReturn | -1.57     |
| TotalSamples  | 106624    |
-----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007169212913140655
Validation loss = 0.000341634702635929
Validation loss = 0.0006226752884685993
Validation loss = 0.00046800979180261493
Validation loss = 0.00039340759394690394
Validation loss = 0.0003843148588202894
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005986588075757027
Validation loss = 0.0005464688292704523
Validation loss = 0.0005843447288498282
Validation loss = 0.0004622760752681643
Validation loss = 0.0005292538553476334
Validation loss = 0.00036473365616984665
Validation loss = 0.0006136169540695846
Validation loss = 0.0005340030184015632
Validation loss = 0.0003720882232300937
Validation loss = 0.0004347498470451683
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006152522400952876
Validation loss = 0.00036297301994636655
Validation loss = 0.0004389882378745824
Validation loss = 0.0003466204216238111
Validation loss = 0.0007510147988796234
Validation loss = 0.00047450533020310104
Validation loss = 0.00047670776257291436
Validation loss = 0.0006567768868990242
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007135303458198905
Validation loss = 0.0004488205595407635
Validation loss = 0.000321236060699448
Validation loss = 0.000296552520012483
Validation loss = 0.0004652400966733694
Validation loss = 0.0004496471374295652
Validation loss = 0.000401228666305542
Validation loss = 0.0003659041249193251
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00041805862565524876
Validation loss = 0.00038523838156834245
Validation loss = 0.0005797620979137719
Validation loss = 0.0003518160374369472
Validation loss = 0.0004228395991958678
Validation loss = 0.00037218770012259483
Validation loss = 0.00053041847422719
Validation loss = 0.0004220905830152333
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.54    |
| Iteration     | 63       |
| MaximumReturn | -0.00291 |
| MinimumReturn | -53.5    |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006643769447691739
Validation loss = 0.0006019228603690863
Validation loss = 0.000365220766980201
Validation loss = 0.0003878550778608769
Validation loss = 0.000526504882145673
Validation loss = 0.00048511283239349723
Validation loss = 0.00043838800047524273
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005716216983273625
Validation loss = 0.00030273120501078665
Validation loss = 0.00036363216349855065
Validation loss = 0.00034583339584060013
Validation loss = 0.0004877846222370863
Validation loss = 0.0004902768414467573
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0007939895149320364
Validation loss = 0.0004125352716073394
Validation loss = 0.0003600159252528101
Validation loss = 0.0002613851975183934
Validation loss = 0.0005816229386255145
Validation loss = 0.00036318472120910883
Validation loss = 0.0007887815008871257
Validation loss = 0.00037276317016221583
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000307266105664894
Validation loss = 0.0007509023998863995
Validation loss = 0.0003280364035163075
Validation loss = 0.00028638617368415
Validation loss = 0.0004258133703842759
Validation loss = 0.00042711806599982083
Validation loss = 0.0011126649333164096
Validation loss = 0.00032277803984470665
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.000504295458085835
Validation loss = 0.0004347936774138361
Validation loss = 0.00031250104075297713
Validation loss = 0.0004194978391751647
Validation loss = 0.00036637368611991405
Validation loss = 0.00035818718606606126
Validation loss = 0.00035427490365691483
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -3.61     |
| Iteration     | 64        |
| MaximumReturn | -0.000624 |
| MinimumReturn | -86.8     |
| TotalSamples  | 109956    |
-----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007059653289616108
Validation loss = 0.0003768937021959573
Validation loss = 0.0005040405667386949
Validation loss = 0.0004543751711025834
Validation loss = 0.000426279817475006
Validation loss = 0.00044940263614989817
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005725488299503922
Validation loss = 0.0004576334613375366
Validation loss = 0.000453215092420578
Validation loss = 0.00036845667636953294
Validation loss = 0.0005492070340551436
Validation loss = 0.00035557133378461003
Validation loss = 0.0004271295329090208
Validation loss = 0.0005801859078928828
Validation loss = 0.0004936385666951537
Validation loss = 0.00044708375935442746
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0003575990558601916
Validation loss = 0.0003197917831130326
Validation loss = 0.0003307287988718599
Validation loss = 0.0003475536941550672
Validation loss = 0.0002809099678415805
Validation loss = 0.001201162813231349
Validation loss = 0.0003503931511659175
Validation loss = 0.0005150277283973992
Validation loss = 0.0005452809273265302
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00031531162676401436
Validation loss = 0.000757351575884968
Validation loss = 0.002857829676941037
Validation loss = 0.000505511648952961
Validation loss = 0.0003430504584684968
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0005322103970684111
Validation loss = 0.0002525266318116337
Validation loss = 0.0005347030819393694
Validation loss = 0.00040966071537695825
Validation loss = 0.00048286799574270844
Validation loss = 0.0004371587128844112
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.907   |
| Iteration     | 65       |
| MaximumReturn | -0.35    |
| MinimumReturn | -1.64    |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0005418712389655411
Validation loss = 0.0003589221742004156
Validation loss = 0.00037464257911778986
Validation loss = 0.000712605775333941
Validation loss = 0.0004200419643893838
Validation loss = 0.0004494749300647527
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0004405014624353498
Validation loss = 0.0004006845992989838
Validation loss = 0.000444737815996632
Validation loss = 0.00034132067230530083
Validation loss = 0.0003341650008223951
Validation loss = 0.0006017301348038018
Validation loss = 0.0004268771444913
Validation loss = 0.00029131933115422726
Validation loss = 0.0002922803396359086
Validation loss = 0.0008751325076445937
Validation loss = 0.00028861971804872155
Validation loss = 0.0003922921314369887
Validation loss = 0.00030179909663274884
Validation loss = 0.0003938856825698167
Validation loss = 0.0003757256199605763
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000322942592902109
Validation loss = 0.000351963157299906
Validation loss = 0.000319249666063115
Validation loss = 0.0003124985087197274
Validation loss = 0.00041810364928096533
Validation loss = 0.00043281257967464626
Validation loss = 0.0003996462619397789
Validation loss = 0.00037604919634759426
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0006530280807055533
Validation loss = 0.00036905481829307973
Validation loss = 0.00035763627965934575
Validation loss = 0.0005173793178983033
Validation loss = 0.00030868264730088413
Validation loss = 0.0006385005544871092
Validation loss = 0.00035097094951197505
Validation loss = 0.0004653948999475688
Validation loss = 0.0003362202551215887
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00026995735242962837
Validation loss = 0.0009004046441987157
Validation loss = 0.0004059447965119034
Validation loss = 0.0003072863328270614
Validation loss = 0.0003963752242270857
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0171   |
| Iteration     | 66        |
| MaximumReturn | -0.000623 |
| MinimumReturn | -0.29     |
| TotalSamples  | 113288    |
-----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0002981971192639321
Validation loss = 0.00044048254494555295
Validation loss = 0.0004505760152824223
Validation loss = 0.00041587610030546784
Validation loss = 0.00037435939884744585
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0007975263870321214
Validation loss = 0.0003366685996297747
Validation loss = 0.0012800757540389895
Validation loss = 0.00028932225541211665
Validation loss = 0.00036407235893420875
Validation loss = 0.00034853810211643577
Validation loss = 0.0002741310163401067
Validation loss = 0.00041560124373063445
Validation loss = 0.000361505284672603
Validation loss = 0.0005282104830257595
Validation loss = 0.00038781436160206795
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0002726537350099534
Validation loss = 0.0003511704853735864
Validation loss = 0.00028452620608732104
Validation loss = 0.00031334167579188943
Validation loss = 0.0004415777511894703
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0003388960030861199
Validation loss = 0.00046040138113312423
Validation loss = 0.00038805839722044766
Validation loss = 0.0003190790012013167
Validation loss = 0.0004788972728420049
Validation loss = 0.0009354402427561581
Validation loss = 0.0004262590955477208
Validation loss = 0.00040558361797593534
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0007489866693504155
Validation loss = 0.0005776287289336324
Validation loss = 0.0005239834426902235
Validation loss = 0.0004136763163842261
Validation loss = 0.00039933621883392334
Validation loss = 0.00029918315703980625
Validation loss = 0.0005142700974829495
Validation loss = 0.0005290113040246069
Validation loss = 0.0003627689147833735
Validation loss = 0.0002782711817417294
Validation loss = 0.00046098604798316956
Validation loss = 0.0003468083159532398
Validation loss = 0.00032578196260146797
Validation loss = 0.0003010753425769508
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0281   |
| Iteration     | 67        |
| MaximumReturn | -0.000578 |
| MinimumReturn | -0.524    |
| TotalSamples  | 114954    |
-----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00048319759662263095
Validation loss = 0.00037785933818668127
Validation loss = 0.0003882214950863272
Validation loss = 0.0003404220042284578
Validation loss = 0.0005063357530161738
Validation loss = 0.00045779315405525267
Validation loss = 0.0003829779161605984
Validation loss = 0.0003035016416106373
Validation loss = 0.0007286390173248947
Validation loss = 0.00041379083995707333
Validation loss = 0.0008775779278948903
Validation loss = 0.00043075132998637855
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00032942145480774343
Validation loss = 0.0005839575314894319
Validation loss = 0.0003382758586667478
Validation loss = 0.00041013697045855224
Validation loss = 0.0003531313268467784
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.000406062783440575
Validation loss = 0.00032308782101608813
Validation loss = 0.0005034347414039075
Validation loss = 0.0007674628868699074
Validation loss = 0.000515872030518949
Validation loss = 0.0008942433632910252
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0003023860335815698
Validation loss = 0.00037174540921114385
Validation loss = 0.000382230500690639
Validation loss = 0.0004047067486681044
Validation loss = 0.000519430439453572
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0002700012701097876
Validation loss = 0.0006944245542399585
Validation loss = 0.0002741119242273271
Validation loss = 0.0003426147159188986
Validation loss = 0.00033084015012718737
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0807  |
| Iteration     | 68       |
| MaximumReturn | -0.00106 |
| MinimumReturn | -0.553   |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00035756101715378463
Validation loss = 0.00039249853580258787
Validation loss = 0.0003738093946594745
Validation loss = 0.001087652170099318
Validation loss = 0.0004578086663968861
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00048521390999667346
Validation loss = 0.00031928622047416866
Validation loss = 0.0004245862946845591
Validation loss = 0.0004992559552192688
Validation loss = 0.0005242148763500154
Validation loss = 0.00031447765650227666
Validation loss = 0.00047966442070901394
Validation loss = 0.00035003258381038904
Validation loss = 0.0003944808559026569
Validation loss = 0.0003151778073515743
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00035753255360759795
Validation loss = 0.00035369538818486035
Validation loss = 0.000489921192638576
Validation loss = 0.0002903475542552769
Validation loss = 0.0004465311940293759
Validation loss = 0.00031044555362313986
Validation loss = 0.00031596352346241474
Validation loss = 0.0004303246387280524
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00035318982554599643
Validation loss = 0.00030170424724929035
Validation loss = 0.000414714973885566
Validation loss = 0.00035600989940576255
Validation loss = 0.00041029651765711606
Validation loss = 0.00044964143307879567
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00028968017431907356
Validation loss = 0.0002454515779390931
Validation loss = 0.0003734478377737105
Validation loss = 0.000450882944278419
Validation loss = 0.0002714680740609765
Validation loss = 0.00035964904236607254
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0309   |
| Iteration     | 69        |
| MaximumReturn | -0.000618 |
| MinimumReturn | -0.614    |
| TotalSamples  | 118286    |
-----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0003989271353930235
Validation loss = 0.0003633979940786958
Validation loss = 0.00041521183447912335
Validation loss = 0.00033323949901387095
Validation loss = 0.0003302199475001544
Validation loss = 0.00039677368476986885
Validation loss = 0.0002946190070360899
Validation loss = 0.00038503058021888137
Validation loss = 0.0006511352839879692
Validation loss = 0.0003688446304295212
Validation loss = 0.000352048926288262
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0005159333813935518
Validation loss = 0.00036509541678242385
Validation loss = 0.00037062817136757076
Validation loss = 0.0003233171592000872
Validation loss = 0.0004194898356217891
Validation loss = 0.00027517430135048926
Validation loss = 0.00032558973180130124
Validation loss = 0.0003162914654240012
Validation loss = 0.00041331269312649965
Validation loss = 0.00035785886575467885
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004709848726633936
Validation loss = 0.00027994110132567585
Validation loss = 0.0005432270700111985
Validation loss = 0.0003164718800690025
Validation loss = 0.0003315853246022016
Validation loss = 0.0003757452068384737
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0008142314036376774
Validation loss = 0.000314181437715888
Validation loss = 0.00028674182249233127
Validation loss = 0.000749293714761734
Validation loss = 0.0006754377391189337
Validation loss = 0.00028878191369585693
Validation loss = 0.0004593573685269803
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00034724094439297915
Validation loss = 0.0005069992621429265
Validation loss = 0.0005416971980594099
Validation loss = 0.0002905255532823503
Validation loss = 0.0004374446871224791
Validation loss = 0.0004787242505699396
Validation loss = 0.0006349915638566017
Validation loss = 0.00032442816882394254
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0324   |
| Iteration     | 70        |
| MaximumReturn | -0.000986 |
| MinimumReturn | -0.388    |
| TotalSamples  | 119952    |
-----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0007784462650306523
Validation loss = 0.00033606329816393554
Validation loss = 0.00039967367774806917
Validation loss = 0.00047906345571391284
Validation loss = 0.000522608810570091
Validation loss = 0.0003810521448031068
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000732873217202723
Validation loss = 0.000441221782239154
Validation loss = 0.00038843450602144003
Validation loss = 0.000504185154568404
Validation loss = 0.00030873980722390115
Validation loss = 0.00037952204002067447
Validation loss = 0.0005331767024472356
Validation loss = 0.00039164151530712843
Validation loss = 0.00029847040423192084
Validation loss = 0.0002625664637889713
Validation loss = 0.0008563523879274726
Validation loss = 0.0009422899456694722
Validation loss = 0.00029124828870408237
Validation loss = 0.00035923163522966206
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00031987184775061905
Validation loss = 0.00047533601173199713
Validation loss = 0.000299264705972746
Validation loss = 0.0006785838631913066
Validation loss = 0.0004526185803115368
Validation loss = 0.0004477655456867069
Validation loss = 0.00036368798464536667
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00046344322618097067
Validation loss = 0.0004355052369646728
Validation loss = 0.000332274183165282
Validation loss = 0.0003439791325945407
Validation loss = 0.000342434155754745
Validation loss = 0.00044557685032486916
Validation loss = 0.0004802362236659974
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00039967012708075345
Validation loss = 0.0005981400026939809
Validation loss = 0.0002568793424870819
Validation loss = 0.0004374176496639848
Validation loss = 0.0004740544536616653
Validation loss = 0.0005200878949835896
Validation loss = 0.00031095935264602304
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0246   |
| Iteration     | 71        |
| MaximumReturn | -0.000753 |
| MinimumReturn | -0.283    |
| TotalSamples  | 121618    |
-----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0006774124340154231
Validation loss = 0.0005171523662284017
Validation loss = 0.0004478342889342457
Validation loss = 0.00040978306788019836
Validation loss = 0.000393545109545812
Validation loss = 0.0003456219274085015
Validation loss = 0.0003908564685843885
Validation loss = 0.00039823082624934614
Validation loss = 0.00038219382986426353
Validation loss = 0.0003483313776087016
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00048551891813986003
Validation loss = 0.0002806875272653997
Validation loss = 0.00028179632499814034
Validation loss = 0.0003497032157611102
Validation loss = 0.0003848458582069725
Validation loss = 0.0002888966992031783
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004228620673529804
Validation loss = 0.0014614261453971267
Validation loss = 0.00037432933459058404
Validation loss = 0.0005411182646639645
Validation loss = 0.00043105060467496514
Validation loss = 0.0004302268789615482
Validation loss = 0.0003310080210212618
Validation loss = 0.0003247353306505829
Validation loss = 0.00033001977135427296
Validation loss = 0.0004846637020818889
Validation loss = 0.00032328563975170255
Validation loss = 0.00028526788810268044
Validation loss = 0.0005672246916219592
Validation loss = 0.0003406466275919229
Validation loss = 0.00034363241866230965
Validation loss = 0.0003351056657265872
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00031929631950333714
Validation loss = 0.0005278214812278748
Validation loss = 0.0009029129287227988
Validation loss = 0.0004556089697871357
Validation loss = 0.00043480595923028886
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0003064107440877706
Validation loss = 0.00028500903863459826
Validation loss = 0.0004166388534940779
Validation loss = 0.0012750825844705105
Validation loss = 0.0002664201019797474
Validation loss = 0.0005259629106149077
Validation loss = 0.0002945202577393502
Validation loss = 0.0003535390133038163
Validation loss = 0.0005291587440297008
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.171   |
| Iteration     | 72       |
| MaximumReturn | -0.00126 |
| MinimumReturn | -0.775   |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004621478437911719
Validation loss = 0.0005666902288794518
Validation loss = 0.00033292724401690066
Validation loss = 0.0004442389472387731
Validation loss = 0.0004009571566712111
Validation loss = 0.0003761042607948184
Validation loss = 0.00047062712837941945
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00031639455119147897
Validation loss = 0.00027744576800614595
Validation loss = 0.0003529430541675538
Validation loss = 0.0003417760308366269
Validation loss = 0.000343305611750111
Validation loss = 0.0003901633608620614
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0002837784413713962
Validation loss = 0.0003563017526175827
Validation loss = 0.0003676331543829292
Validation loss = 0.00040179246570914984
Validation loss = 0.0002970885834656656
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0004691306094173342
Validation loss = 0.0007226357120089233
Validation loss = 0.000382487487513572
Validation loss = 0.00033464806620031595
Validation loss = 0.0010612525511533022
Validation loss = 0.0003499173326417804
Validation loss = 0.0003084155323449522
Validation loss = 0.0005056255031377077
Validation loss = 0.0004719018761534244
Validation loss = 0.000338101846864447
Validation loss = 0.0009020666475407779
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00028809846844524145
Validation loss = 0.0004803840129170567
Validation loss = 0.0002980497956741601
Validation loss = 0.0003075517015531659
Validation loss = 0.0002813690807670355
Validation loss = 0.001079210196621716
Validation loss = 0.0002720642660278827
Validation loss = 0.00035196420503780246
Validation loss = 0.00033966152113862336
Validation loss = 0.0002918363024946302
Validation loss = 0.0004455906746443361
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0335   |
| Iteration     | 73        |
| MaximumReturn | -0.000598 |
| MinimumReturn | -0.456    |
| TotalSamples  | 124950    |
-----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00033742067171260715
Validation loss = 0.0004990781890228391
Validation loss = 0.00043616312905214727
Validation loss = 0.00046662703971378505
Validation loss = 0.00032563749118708074
Validation loss = 0.0004493524902500212
Validation loss = 0.0002977085532620549
Validation loss = 0.0004478332994040102
Validation loss = 0.0004272336373105645
Validation loss = 0.0005011961329728365
Validation loss = 0.0004532767634373158
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.000310238596284762
Validation loss = 0.00037438099388964474
Validation loss = 0.00036277875187806785
Validation loss = 0.000281100335996598
Validation loss = 0.0004889766569249332
Validation loss = 0.00042201150790788233
Validation loss = 0.000448487262474373
Validation loss = 0.0006970238755457103
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00041201143176294863
Validation loss = 0.0004538969951681793
Validation loss = 0.00038720693555660546
Validation loss = 0.0005561854341067374
Validation loss = 0.00030575081473216414
Validation loss = 0.00038842816138640046
Validation loss = 0.00031080839107744396
Validation loss = 0.00034307694295421243
Validation loss = 0.00037004705518484116
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0007490662974305451
Validation loss = 0.00034935487201437354
Validation loss = 0.0003545642248354852
Validation loss = 0.0002775880566332489
Validation loss = 0.0003836806572508067
Validation loss = 0.0003568266110960394
Validation loss = 0.00027881903224624693
Validation loss = 0.0002836305648088455
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00038524335832335055
Validation loss = 0.0002383324026595801
Validation loss = 0.00029699638253077865
Validation loss = 0.0003952775150537491
Validation loss = 0.0004999014199711382
Validation loss = 0.0003040555748157203
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.12     |
| Iteration     | 74        |
| MaximumReturn | -0.000785 |
| MinimumReturn | -0.536    |
| TotalSamples  | 126616    |
-----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0003812522627413273
Validation loss = 0.0003222859522793442
Validation loss = 0.0006303931586444378
Validation loss = 0.00046196740004234016
Validation loss = 0.00041125775896944106
Validation loss = 0.0005909218452870846
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0003679131914395839
Validation loss = 0.00027547814534045756
Validation loss = 0.00040893364348448813
Validation loss = 0.0005316737806424499
Validation loss = 0.00044604248250834644
Validation loss = 0.0003286819264758378
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0005950928316451609
Validation loss = 0.0003459218714851886
Validation loss = 0.0002913567877840251
Validation loss = 0.000304581131786108
Validation loss = 0.00027952686650678515
Validation loss = 0.0003004834579769522
Validation loss = 0.0005082165589556098
Validation loss = 0.0004537851782515645
Validation loss = 0.00036597970756702125
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.000675768475048244
Validation loss = 0.0006213117158040404
Validation loss = 0.00043516175355762243
Validation loss = 0.0005182512104511261
Validation loss = 0.000296874059131369
Validation loss = 0.00033627732773311436
Validation loss = 0.0007396270520985126
Validation loss = 0.0004293876700103283
Validation loss = 0.00031623319955542684
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00032733267289586365
Validation loss = 0.00024239555932581425
Validation loss = 0.0002901133266277611
Validation loss = 0.00040548518882133067
Validation loss = 0.000723678560461849
Validation loss = 0.00033085959148593247
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.106    |
| Iteration     | 75        |
| MaximumReturn | -0.000684 |
| MinimumReturn | -0.632    |
| TotalSamples  | 128282    |
-----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0003783821302931756
Validation loss = 0.0005395421758294106
Validation loss = 0.000840594875626266
Validation loss = 0.0004101033555343747
Validation loss = 0.0003430969372857362
Validation loss = 0.0003202367224730551
Validation loss = 0.00035117784864269197
Validation loss = 0.0004456715250853449
Validation loss = 0.0005808251444250345
Validation loss = 0.000344756874255836
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0003331004118081182
Validation loss = 0.00034541983040980995
Validation loss = 0.0005919607356190681
Validation loss = 0.000422046025050804
Validation loss = 0.0007836128352209926
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0003268287400715053
Validation loss = 0.0002783732197713107
Validation loss = 0.00027540011797100306
Validation loss = 0.0005639491137117147
Validation loss = 0.0004657910321839154
Validation loss = 0.0006972115952521563
Validation loss = 0.0005010842578485608
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0003290698805358261
Validation loss = 0.0002846544375643134
Validation loss = 0.00033195840660482645
Validation loss = 0.0002760424977168441
Validation loss = 0.00028705011936835945
Validation loss = 0.0003423814196139574
Validation loss = 0.00044211558997631073
Validation loss = 0.0003059947048313916
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00030341348610818386
Validation loss = 0.00035745042259804904
Validation loss = 0.0003957111621275544
Validation loss = 0.00033985692425630987
Validation loss = 0.00042439287062734365
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0734   |
| Iteration     | 76        |
| MaximumReturn | -0.000544 |
| MinimumReturn | -0.672    |
| TotalSamples  | 129948    |
-----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004527771379798651
Validation loss = 0.0003360904520377517
Validation loss = 0.0004323557950556278
Validation loss = 0.00041375838918611407
Validation loss = 0.00028425725759007037
Validation loss = 0.0005139097920618951
Validation loss = 0.000337554287398234
Validation loss = 0.00030598900048062205
Validation loss = 0.0012125569628551602
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00031343827140517533
Validation loss = 0.00034329129266552627
Validation loss = 0.0008991404320113361
Validation loss = 0.0004445154336281121
Validation loss = 0.0004949872964061797
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0003705839335452765
Validation loss = 0.00038816386950202286
Validation loss = 0.00033006496960297227
Validation loss = 0.0003427609917707741
Validation loss = 0.00032674154499545693
Validation loss = 0.0007540831575170159
Validation loss = 0.00034421164309605956
Validation loss = 0.0003135322476737201
Validation loss = 0.00032467758865095675
Validation loss = 0.0003837940748780966
Validation loss = 0.00039564885082654655
Validation loss = 0.000301969499560073
Validation loss = 0.00029859819915145636
Validation loss = 0.000735507404897362
Validation loss = 0.000695390859618783
Validation loss = 0.0007796426070854068
Validation loss = 0.0005535392556339502
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00030772050376981497
Validation loss = 0.00032448492129333317
Validation loss = 0.000386792205972597
Validation loss = 0.0005005746497772634
Validation loss = 0.0006396254757419229
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0004622253472916782
Validation loss = 0.00025559827918186784
Validation loss = 0.00042598205618560314
Validation loss = 0.00037929078098386526
Validation loss = 0.0003073459374718368
Validation loss = 0.000651621026918292
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0272   |
| Iteration     | 77        |
| MaximumReturn | -0.000542 |
| MinimumReturn | -0.662    |
| TotalSamples  | 131614    |
-----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00033089882344938815
Validation loss = 0.00031768158078193665
Validation loss = 0.0006221385556273162
Validation loss = 0.00044894692837260664
Validation loss = 0.0006253758911043406
Validation loss = 0.00039909331826493144
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0006940927705727518
Validation loss = 0.00035875823232345283
Validation loss = 0.0006224406533874571
Validation loss = 0.0004575212369672954
Validation loss = 0.00043896082206629217
Validation loss = 0.00034025200875476
Validation loss = 0.00039237659075297415
Validation loss = 0.0004472418804652989
Validation loss = 0.0003335100191179663
Validation loss = 0.00039228895911946893
Validation loss = 0.0005757727194577456
Validation loss = 0.0002662043261807412
Validation loss = 0.000633789342828095
Validation loss = 0.0005723471404053271
Validation loss = 0.0003155985032208264
Validation loss = 0.0005114852683618665
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0006024468457326293
Validation loss = 0.0003377825196366757
Validation loss = 0.0003002532757818699
Validation loss = 0.00036487288889475167
Validation loss = 0.0004199375107418746
Validation loss = 0.0006333308410830796
Validation loss = 0.00038588204188272357
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0010727450717240572
Validation loss = 0.0002982460136990994
Validation loss = 0.0003700483648572117
Validation loss = 0.000629023474175483
Validation loss = 0.00027707492699846625
Validation loss = 0.00028932373970746994
Validation loss = 0.00045240900362841785
Validation loss = 0.0004928288399241865
Validation loss = 0.00044957222416996956
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0003904970653820783
Validation loss = 0.0003054434491787106
Validation loss = 0.00030186906224116683
Validation loss = 0.00032645644387230277
Validation loss = 0.0003131062549073249
Validation loss = 0.00059271021746099
Validation loss = 0.0005757482722401619
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0357   |
| Iteration     | 78        |
| MaximumReturn | -0.000659 |
| MinimumReturn | -0.503    |
| TotalSamples  | 133280    |
-----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0004237591347191483
Validation loss = 0.000384810526156798
Validation loss = 0.00031275692163035274
Validation loss = 0.0007045289967209101
Validation loss = 0.0003930909151677042
Validation loss = 0.0004456495516933501
Validation loss = 0.0004564347327686846
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0003070145030505955
Validation loss = 0.0005488723982125521
Validation loss = 0.0005610019434243441
Validation loss = 0.0005189475486986339
Validation loss = 0.00032192713115364313
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0004655900411307812
Validation loss = 0.00057620694860816
Validation loss = 0.00033635832369327545
Validation loss = 0.0005400793743319809
Validation loss = 0.0003985581861343235
Validation loss = 0.0004673277144320309
Validation loss = 0.00043786014430224895
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0002982248552143574
Validation loss = 0.0002777355839498341
Validation loss = 0.0003701102396007627
Validation loss = 0.0006559364264830947
Validation loss = 0.0005497554084286094
Validation loss = 0.00037138399784453213
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00031745751039125025
Validation loss = 0.00043542205821722746
Validation loss = 0.0006488353246822953
Validation loss = 0.0004013390280306339
Validation loss = 0.0003071039682254195
Validation loss = 0.0003172929282300174
Validation loss = 0.000560447049792856
Validation loss = 0.0003918457077816129
Validation loss = 0.00030871815397404134
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -0.0261   |
| Iteration     | 79        |
| MaximumReturn | -0.000584 |
| MinimumReturn | -0.633    |
| TotalSamples  | 134946    |
-----------------------------
