Logging to experiments/invertedPendulum/IPO01/Tue-01-Nov-2022-09-49-35-PM-CDT_invertedPendulum_trpo_iteration_20_seed2431
Print configuration .....
{'env_name': 'invertedPendulum', 'random_seeds': [3214, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/invertedPendulum_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 80, 'num_path_random': 25, 'num_path_onpol': 25, 'env_horizon': 100, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 100, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.8482309579849243
Validation loss = 0.7108039259910583
Validation loss = 0.6854939460754395
Validation loss = 0.6667653918266296
Validation loss = 0.636821985244751
Validation loss = 0.6130656003952026
Validation loss = 0.6043558120727539
Validation loss = 0.5870257616043091
Validation loss = 0.5762332677841187
Validation loss = 0.5709359049797058
Validation loss = 0.5714990496635437
Validation loss = 0.555000901222229
Validation loss = 0.5536320805549622
Validation loss = 0.5364888310432434
Validation loss = 0.5310564041137695
Validation loss = 0.5305683612823486
Validation loss = 0.5180668234825134
Validation loss = 0.517541229724884
Validation loss = 0.5148546099662781
Validation loss = 0.503847062587738
Validation loss = 0.5294886231422424
Validation loss = 0.5256014466285706
Validation loss = 0.5098759531974792
Validation loss = 0.5059050917625427
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.854190468788147
Validation loss = 0.733359694480896
Validation loss = 0.7033094167709351
Validation loss = 0.683576762676239
Validation loss = 0.6575327515602112
Validation loss = 0.6405389308929443
Validation loss = 0.619035542011261
Validation loss = 0.6065032482147217
Validation loss = 0.599444568157196
Validation loss = 0.5798618793487549
Validation loss = 0.5881297588348389
Validation loss = 0.5788115859031677
Validation loss = 0.557671070098877
Validation loss = 0.5625641345977783
Validation loss = 0.546927273273468
Validation loss = 0.5426999926567078
Validation loss = 0.5225391387939453
Validation loss = 0.5354076027870178
Validation loss = 0.5131523013114929
Validation loss = 0.5133833289146423
Validation loss = 0.49804821610450745
Validation loss = 0.5024245977401733
Validation loss = 0.5010352730751038
Validation loss = 0.5002476572990417
Validation loss = 0.49042385816574097
Validation loss = 0.48940587043762207
Validation loss = 0.4958185851573944
Validation loss = 0.4960951805114746
Validation loss = 0.4848259687423706
Validation loss = 0.48626768589019775
Validation loss = 0.4785960912704468
Validation loss = 0.5003995299339294
Validation loss = 0.48463618755340576
Validation loss = 0.47524189949035645
Validation loss = 0.4868955910205841
Validation loss = 0.47143247723579407
Validation loss = 0.48162123560905457
Validation loss = 0.5059405565261841
Validation loss = 0.559716522693634
Validation loss = 0.4829505681991577
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.847901463508606
Validation loss = 0.7229897975921631
Validation loss = 0.7300328016281128
Validation loss = 0.6804119348526001
Validation loss = 0.6776112914085388
Validation loss = 0.6626352667808533
Validation loss = 0.640181303024292
Validation loss = 0.6177157163619995
Validation loss = 0.6058737635612488
Validation loss = 0.5954217910766602
Validation loss = 0.5916430950164795
Validation loss = 0.5877816677093506
Validation loss = 0.5796712040901184
Validation loss = 0.5712401866912842
Validation loss = 0.5633822083473206
Validation loss = 0.5457390546798706
Validation loss = 0.546506941318512
Validation loss = 0.5359284281730652
Validation loss = 0.5363417267799377
Validation loss = 0.5401369333267212
Validation loss = 0.5277650356292725
Validation loss = 0.5188068747520447
Validation loss = 0.515214204788208
Validation loss = 0.510906457901001
Validation loss = 0.5096026062965393
Validation loss = 0.4906756281852722
Validation loss = 0.49527087807655334
Validation loss = 0.4827672243118286
Validation loss = 0.48973512649536133
Validation loss = 0.487761527299881
Validation loss = 0.5175986886024475
Validation loss = 0.4927316904067993
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.8476393222808838
Validation loss = 0.7223178744316101
Validation loss = 0.6794779896736145
Validation loss = 0.6713844537734985
Validation loss = 0.6543524861335754
Validation loss = 0.6247931122779846
Validation loss = 0.6095468997955322
Validation loss = 0.6079208254814148
Validation loss = 0.5872660279273987
Validation loss = 0.5794782638549805
Validation loss = 0.5615754127502441
Validation loss = 0.5585480332374573
Validation loss = 0.5641573071479797
Validation loss = 0.5439960360527039
Validation loss = 0.5350909233093262
Validation loss = 0.5328493714332581
Validation loss = 0.5201011896133423
Validation loss = 0.5200183391571045
Validation loss = 0.5146245956420898
Validation loss = 0.5059735774993896
Validation loss = 0.49890372157096863
Validation loss = 0.5070383548736572
Validation loss = 0.49657735228538513
Validation loss = 0.49470824003219604
Validation loss = 0.5006866455078125
Validation loss = 0.5051030516624451
Validation loss = 0.4940618872642517
Validation loss = 0.4909001886844635
Validation loss = 0.49713632464408875
Validation loss = 0.47917431592941284
Validation loss = 0.48840248584747314
Validation loss = 0.47728782892227173
Validation loss = 0.4807544946670532
Validation loss = 0.4927222430706024
Validation loss = 0.49587303400039673
Validation loss = 0.49881353974342346
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.8532900810241699
Validation loss = 0.7146385312080383
Validation loss = 0.6763830184936523
Validation loss = 0.6577805876731873
Validation loss = 0.6370466351509094
Validation loss = 0.6185880303382874
Validation loss = 0.5961897969245911
Validation loss = 0.5828967690467834
Validation loss = 0.5669405460357666
Validation loss = 0.5612782835960388
Validation loss = 0.5540727376937866
Validation loss = 0.5550629496574402
Validation loss = 0.5507063269615173
Validation loss = 0.5426334142684937
Validation loss = 0.5256528258323669
Validation loss = 0.5382091999053955
Validation loss = 0.5424349308013916
Validation loss = 0.533525288105011
Validation loss = 0.5152571201324463
Validation loss = 0.5074292421340942
Validation loss = 0.49899858236312866
Validation loss = 0.504623532295227
Validation loss = 0.4924314618110657
Validation loss = 0.509468674659729
Validation loss = 0.5080766677856445
Validation loss = 0.4975263476371765
Validation loss = 0.49777457118034363
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -53.5    |
| Iteration     | 0        |
| MaximumReturn | -0.117   |
| MinimumReturn | -117     |
| TotalSamples  | 3332     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7189881801605225
Validation loss = 0.5521572232246399
Validation loss = 0.5202386975288391
Validation loss = 0.49625712633132935
Validation loss = 0.4846300780773163
Validation loss = 0.47599494457244873
Validation loss = 0.4596750736236572
Validation loss = 0.4605734050273895
Validation loss = 0.44904381036758423
Validation loss = 0.45011287927627563
Validation loss = 0.44124099612236023
Validation loss = 0.442681223154068
Validation loss = 0.4306791424751282
Validation loss = 0.433726966381073
Validation loss = 0.4405575096607208
Validation loss = 0.42551785707473755
Validation loss = 0.44226494431495667
Validation loss = 0.436188668012619
Validation loss = 0.42212799191474915
Validation loss = 0.4331684708595276
Validation loss = 0.4291297197341919
Validation loss = 0.4165061116218567
Validation loss = 0.41848626732826233
Validation loss = 0.41301342844963074
Validation loss = 0.41346412897109985
Validation loss = 0.40239596366882324
Validation loss = 0.4259405732154846
Validation loss = 0.41211727261543274
Validation loss = 0.4109772741794586
Validation loss = 0.4218714237213135
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7740128040313721
Validation loss = 0.5705754160881042
Validation loss = 0.5424174070358276
Validation loss = 0.5206382274627686
Validation loss = 0.5012312531471252
Validation loss = 0.4897734522819519
Validation loss = 0.47541552782058716
Validation loss = 0.4730052649974823
Validation loss = 0.4704032242298126
Validation loss = 0.45369279384613037
Validation loss = 0.4461976885795593
Validation loss = 0.4500655233860016
Validation loss = 0.43538522720336914
Validation loss = 0.4339406192302704
Validation loss = 0.4399305284023285
Validation loss = 0.43591490387916565
Validation loss = 0.43090856075286865
Validation loss = 0.4319447875022888
Validation loss = 0.4453781545162201
Validation loss = 0.4242154657840729
Validation loss = 0.4290279448032379
Validation loss = 0.43379315733909607
Validation loss = 0.43673020601272583
Validation loss = 0.42584410309791565
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.7558469176292419
Validation loss = 0.5577004551887512
Validation loss = 0.5191259980201721
Validation loss = 0.49980810284614563
Validation loss = 0.48942482471466064
Validation loss = 0.47826048731803894
Validation loss = 0.4669591188430786
Validation loss = 0.4629754424095154
Validation loss = 0.45217061042785645
Validation loss = 0.44016993045806885
Validation loss = 0.4390902519226074
Validation loss = 0.4408320188522339
Validation loss = 0.4286883771419525
Validation loss = 0.42828771471977234
Validation loss = 0.4272608160972595
Validation loss = 0.42074817419052124
Validation loss = 0.4169457256793976
Validation loss = 0.4154646396636963
Validation loss = 0.42744335532188416
Validation loss = 0.40605512261390686
Validation loss = 0.4098239243030548
Validation loss = 0.41653579473495483
Validation loss = 0.4143332839012146
Validation loss = 0.4209419786930084
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7840054035186768
Validation loss = 0.5647113919258118
Validation loss = 0.5353121161460876
Validation loss = 0.51297527551651
Validation loss = 0.5001650452613831
Validation loss = 0.4929870069026947
Validation loss = 0.47080209851264954
Validation loss = 0.472897469997406
Validation loss = 0.4650852382183075
Validation loss = 0.4492592215538025
Validation loss = 0.451029896736145
Validation loss = 0.4393363296985626
Validation loss = 0.43881818652153015
Validation loss = 0.4283168911933899
Validation loss = 0.43201494216918945
Validation loss = 0.42233511805534363
Validation loss = 0.43414419889450073
Validation loss = 0.43872302770614624
Validation loss = 0.4281632900238037
Validation loss = 0.42664119601249695
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7090843319892883
Validation loss = 0.5549654364585876
Validation loss = 0.5287898182868958
Validation loss = 0.49834585189819336
Validation loss = 0.48126256465911865
Validation loss = 0.4875198304653168
Validation loss = 0.4704414904117584
Validation loss = 0.462921679019928
Validation loss = 0.4490485191345215
Validation loss = 0.4451346695423126
Validation loss = 0.4430865943431854
Validation loss = 0.4383324682712555
Validation loss = 0.43820029497146606
Validation loss = 0.4342368245124817
Validation loss = 0.4300856292247772
Validation loss = 0.42003342509269714
Validation loss = 0.427084356546402
Validation loss = 0.42464354634284973
Validation loss = 0.43280351161956787
Validation loss = 0.41354843974113464
Validation loss = 0.40814319252967834
Validation loss = 0.44060230255126953
Validation loss = 0.41331127285957336
Validation loss = 0.40459370613098145
Validation loss = 0.4185190200805664
Validation loss = 0.40595361590385437
Validation loss = 0.4009079039096832
Validation loss = 0.4138471484184265
Validation loss = 0.4151691198348999
Validation loss = 0.4088838994503021
Validation loss = 0.41627559065818787
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.133   |
| Iteration     | 1        |
| MaximumReturn | -0.0851  |
| MinimumReturn | -0.174   |
| TotalSamples  | 4998     |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5492249727249146
Validation loss = 0.439421683549881
Validation loss = 0.41428321599960327
Validation loss = 0.40577825903892517
Validation loss = 0.4011925458908081
Validation loss = 0.3987583518028259
Validation loss = 0.3885049521923065
Validation loss = 0.39594927430152893
Validation loss = 0.4016166925430298
Validation loss = 0.38253253698349
Validation loss = 0.38497504591941833
Validation loss = 0.3883236050605774
Validation loss = 0.38867485523223877
Validation loss = 0.38891762495040894
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5556280612945557
Validation loss = 0.4345944821834564
Validation loss = 0.4127498269081116
Validation loss = 0.4064663052558899
Validation loss = 0.4017888903617859
Validation loss = 0.40549665689468384
Validation loss = 0.4024089574813843
Validation loss = 0.4048435688018799
Validation loss = 0.39144372940063477
Validation loss = 0.3958536386489868
Validation loss = 0.3844050168991089
Validation loss = 0.39688220620155334
Validation loss = 0.39272093772888184
Validation loss = 0.3826329708099365
Validation loss = 0.3885582685470581
Validation loss = 0.3907833695411682
Validation loss = 0.3824673891067505
Validation loss = 0.39568158984184265
Validation loss = 0.38513243198394775
Validation loss = 0.3929131031036377
Validation loss = 0.3907206058502197
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5359435081481934
Validation loss = 0.430195689201355
Validation loss = 0.4069453477859497
Validation loss = 0.4002379775047302
Validation loss = 0.39482033252716064
Validation loss = 0.3888016939163208
Validation loss = 0.39055371284484863
Validation loss = 0.38532811403274536
Validation loss = 0.39525455236434937
Validation loss = 0.3792888820171356
Validation loss = 0.3774007558822632
Validation loss = 0.37420621514320374
Validation loss = 0.380154013633728
Validation loss = 0.3733838200569153
Validation loss = 0.38389062881469727
Validation loss = 0.37710070610046387
Validation loss = 0.3813233971595764
Validation loss = 0.3748975396156311
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5223551392555237
Validation loss = 0.43607860803604126
Validation loss = 0.415218323469162
Validation loss = 0.39960527420043945
Validation loss = 0.4083784818649292
Validation loss = 0.39672940969467163
Validation loss = 0.4028194844722748
Validation loss = 0.3924736976623535
Validation loss = 0.3877566456794739
Validation loss = 0.38658607006073
Validation loss = 0.3909643292427063
Validation loss = 0.3882331848144531
Validation loss = 0.3916047215461731
Validation loss = 0.38041961193084717
Validation loss = 0.38537171483039856
Validation loss = 0.38757601380348206
Validation loss = 0.3888639807701111
Validation loss = 0.39876535534858704
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5510169267654419
Validation loss = 0.42899951338768005
Validation loss = 0.41251468658447266
Validation loss = 0.4125911593437195
Validation loss = 0.4036237299442291
Validation loss = 0.402723491191864
Validation loss = 0.39923757314682007
Validation loss = 0.39656007289886475
Validation loss = 0.3871622085571289
Validation loss = 0.38918715715408325
Validation loss = 0.38462644815444946
Validation loss = 0.3777640759944916
Validation loss = 0.39081859588623047
Validation loss = 0.3843679428100586
Validation loss = 0.3877595067024231
Validation loss = 0.3935220241546631
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0725  |
| Iteration     | 2        |
| MaximumReturn | -0.0263  |
| MinimumReturn | -0.143   |
| TotalSamples  | 6664     |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4077642858028412
Validation loss = 0.3740549087524414
Validation loss = 0.3790881931781769
Validation loss = 0.37767061591148376
Validation loss = 0.3700218200683594
Validation loss = 0.37253960967063904
Validation loss = 0.37278684973716736
Validation loss = 0.3756192922592163
Validation loss = 0.37724408507347107
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42323634028434753
Validation loss = 0.3752799928188324
Validation loss = 0.3771023750305176
Validation loss = 0.37506282329559326
Validation loss = 0.37036386132240295
Validation loss = 0.37499499320983887
Validation loss = 0.3810993731021881
Validation loss = 0.37642064690589905
Validation loss = 0.3777511417865753
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.41842079162597656
Validation loss = 0.3711252510547638
Validation loss = 0.37238916754722595
Validation loss = 0.36315295100212097
Validation loss = 0.3746981620788574
Validation loss = 0.36864790320396423
Validation loss = 0.3694431483745575
Validation loss = 0.3762165307998657
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.41639140248298645
Validation loss = 0.37686339020729065
Validation loss = 0.3724822998046875
Validation loss = 0.36682578921318054
Validation loss = 0.3680360019207001
Validation loss = 0.3649607002735138
Validation loss = 0.37861478328704834
Validation loss = 0.37255391478538513
Validation loss = 0.37536177039146423
Validation loss = 0.3782453238964081
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.423220157623291
Validation loss = 0.377449631690979
Validation loss = 0.37122106552124023
Validation loss = 0.369146466255188
Validation loss = 0.37995728850364685
Validation loss = 0.36338499188423157
Validation loss = 0.3772125542163849
Validation loss = 0.3701399862766266
Validation loss = 0.3845408260822296
Validation loss = 0.3753852844238281
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0613  |
| Iteration     | 3        |
| MaximumReturn | -0.0259  |
| MinimumReturn | -0.157   |
| TotalSamples  | 8330     |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4116147458553314
Validation loss = 0.3759283721446991
Validation loss = 0.36979418992996216
Validation loss = 0.3745751976966858
Validation loss = 0.3712281882762909
Validation loss = 0.37388044595718384
Validation loss = 0.3732462525367737
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41357046365737915
Validation loss = 0.3689572811126709
Validation loss = 0.37010377645492554
Validation loss = 0.37811750173568726
Validation loss = 0.3754611909389496
Validation loss = 0.37423351407051086
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.38753265142440796
Validation loss = 0.3644111454486847
Validation loss = 0.36497849225997925
Validation loss = 0.3680620491504669
Validation loss = 0.3696317672729492
Validation loss = 0.38197949528694153
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3939790725708008
Validation loss = 0.37003740668296814
Validation loss = 0.3694157898426056
Validation loss = 0.370540052652359
Validation loss = 0.37366726994514465
Validation loss = 0.3698042333126068
Validation loss = 0.3693388104438782
Validation loss = 0.38856950402259827
Validation loss = 0.37910062074661255
Validation loss = 0.3768198788166046
Validation loss = 0.37904390692710876
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4068107008934021
Validation loss = 0.3704959750175476
Validation loss = 0.38586023449897766
Validation loss = 0.3784932494163513
Validation loss = 0.3801044523715973
Validation loss = 0.37793242931365967
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -81.5    |
| Iteration     | 4        |
| MaximumReturn | -51.7    |
| MinimumReturn | -98.3    |
| TotalSamples  | 9996     |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39634042978286743
Validation loss = 0.3694043457508087
Validation loss = 0.3701452612876892
Validation loss = 0.3660561740398407
Validation loss = 0.37229856848716736
Validation loss = 0.3724599778652191
Validation loss = 0.3689156174659729
Validation loss = 0.37029534578323364
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3878466486930847
Validation loss = 0.3690735101699829
Validation loss = 0.3690651059150696
Validation loss = 0.37235507369041443
Validation loss = 0.36868974566459656
Validation loss = 0.3685501217842102
Validation loss = 0.37639468908309937
Validation loss = 0.37052324414253235
Validation loss = 0.3724725544452667
Validation loss = 0.3684559166431427
Validation loss = 0.36934158205986023
Validation loss = 0.364905446767807
Validation loss = 0.3861691355705261
Validation loss = 0.3678879141807556
Validation loss = 0.3757173717021942
Validation loss = 0.37622708082199097
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.39871513843536377
Validation loss = 0.36842772364616394
Validation loss = 0.3615056872367859
Validation loss = 0.363496869802475
Validation loss = 0.36617812514305115
Validation loss = 0.35957100987434387
Validation loss = 0.36382168531417847
Validation loss = 0.37363752722740173
Validation loss = 0.36650654673576355
Validation loss = 0.373791366815567
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4030345380306244
Validation loss = 0.36440861225128174
Validation loss = 0.3658204674720764
Validation loss = 0.36576053500175476
Validation loss = 0.38348206877708435
Validation loss = 0.36381831765174866
Validation loss = 0.3632335364818573
Validation loss = 0.3720158040523529
Validation loss = 0.3671191334724426
Validation loss = 0.37496453523635864
Validation loss = 0.3761897683143616
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3920486867427826
Validation loss = 0.3658735752105713
Validation loss = 0.37630313634872437
Validation loss = 0.3633803725242615
Validation loss = 0.3651650846004486
Validation loss = 0.3727771043777466
Validation loss = 0.37172597646713257
Validation loss = 0.37549450993537903
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.18    |
| Iteration     | 5        |
| MaximumReturn | -0.0673  |
| MinimumReturn | -3.51    |
| TotalSamples  | 11662    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39852750301361084
Validation loss = 0.3795657157897949
Validation loss = 0.378339946269989
Validation loss = 0.39542874693870544
Validation loss = 0.38789182901382446
Validation loss = 0.38810795545578003
Validation loss = 0.3885464668273926
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4065167307853699
Validation loss = 0.38591235876083374
Validation loss = 0.38832420110702515
Validation loss = 0.38101905584335327
Validation loss = 0.383553683757782
Validation loss = 0.3866717517375946
Validation loss = 0.38716569542884827
Validation loss = 0.39298397302627563
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4072578549385071
Validation loss = 0.3870251178741455
Validation loss = 0.3815214931964874
Validation loss = 0.3808799088001251
Validation loss = 0.3789394497871399
Validation loss = 0.3830661177635193
Validation loss = 0.3878891170024872
Validation loss = 0.38559669256210327
Validation loss = 0.38163378834724426
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4054677486419678
Validation loss = 0.39602553844451904
Validation loss = 0.3852304518222809
Validation loss = 0.38580435514450073
Validation loss = 0.3843722939491272
Validation loss = 0.38319945335388184
Validation loss = 0.3878023028373718
Validation loss = 0.3959799110889435
Validation loss = 0.38317906856536865
Validation loss = 0.3909715712070465
Validation loss = 0.3826117515563965
Validation loss = 0.3912200629711151
Validation loss = 0.38897785544395447
Validation loss = 0.38705161213874817
Validation loss = 0.39795368909835815
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.40197110176086426
Validation loss = 0.3855932354927063
Validation loss = 0.3853789269924164
Validation loss = 0.390636146068573
Validation loss = 0.3794742524623871
Validation loss = 0.3906490206718445
Validation loss = 0.38776737451553345
Validation loss = 0.3950638473033905
Validation loss = 0.3866696357727051
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0992  |
| Iteration     | 6        |
| MaximumReturn | -0.033   |
| MinimumReturn | -0.309   |
| TotalSamples  | 13328    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.39999672770500183
Validation loss = 0.3906829357147217
Validation loss = 0.3918958604335785
Validation loss = 0.39711907505989075
Validation loss = 0.3959314823150635
Validation loss = 0.40258970856666565
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3890272378921509
Validation loss = 0.3994860351085663
Validation loss = 0.3918668329715729
Validation loss = 0.3985194265842438
Validation loss = 0.3941051959991455
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.401793509721756
Validation loss = 0.38326048851013184
Validation loss = 0.3921600580215454
Validation loss = 0.394273042678833
Validation loss = 0.3890227973461151
Validation loss = 0.39382994174957275
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4089520275592804
Validation loss = 0.39443469047546387
Validation loss = 0.3953722417354584
Validation loss = 0.4000534117221832
Validation loss = 0.40736857056617737
Validation loss = 0.4034431278705597
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.388624906539917
Validation loss = 0.3872215449810028
Validation loss = 0.39345598220825195
Validation loss = 0.39212527871131897
Validation loss = 0.385943740606308
Validation loss = 0.39490342140197754
Validation loss = 0.3940539062023163
Validation loss = 0.38974395394325256
Validation loss = 0.39855802059173584
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0551  |
| Iteration     | 7        |
| MaximumReturn | -0.0244  |
| MinimumReturn | -0.0938  |
| TotalSamples  | 14994    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40205883979797363
Validation loss = 0.38653579354286194
Validation loss = 0.3904344439506531
Validation loss = 0.3864046037197113
Validation loss = 0.39236465096473694
Validation loss = 0.3960949778556824
Validation loss = 0.38703107833862305
Validation loss = 0.3926410973072052
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.394172340631485
Validation loss = 0.39157432317733765
Validation loss = 0.3967417776584625
Validation loss = 0.3915039002895355
Validation loss = 0.39413827657699585
Validation loss = 0.3985285758972168
Validation loss = 0.4000125527381897
Validation loss = 0.3941633701324463
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.39464572072029114
Validation loss = 0.3894502520561218
Validation loss = 0.390102356672287
Validation loss = 0.3874070346355438
Validation loss = 0.4027877449989319
Validation loss = 0.39466550946235657
Validation loss = 0.39902251958847046
Validation loss = 0.3884819447994232
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4054970443248749
Validation loss = 0.4006843566894531
Validation loss = 0.40981245040893555
Validation loss = 0.39728015661239624
Validation loss = 0.40563949942588806
Validation loss = 0.3991706371307373
Validation loss = 0.4029383063316345
Validation loss = 0.4026542007923126
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.39783188700675964
Validation loss = 0.39576026797294617
Validation loss = 0.3913175165653229
Validation loss = 0.39513149857521057
Validation loss = 0.39827805757522583
Validation loss = 0.3904660642147064
Validation loss = 0.4044916331768036
Validation loss = 0.39416366815567017
Validation loss = 0.40004682540893555
Validation loss = 0.4023128151893616
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0657  |
| Iteration     | 8        |
| MaximumReturn | -0.0276  |
| MinimumReturn | -0.153   |
| TotalSamples  | 16660    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3999190330505371
Validation loss = 0.41140255331993103
Validation loss = 0.39558058977127075
Validation loss = 0.4070374071598053
Validation loss = 0.39696335792541504
Validation loss = 0.4087379276752472
Validation loss = 0.40392178297042847
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3980971872806549
Validation loss = 0.39597439765930176
Validation loss = 0.40610471367836
Validation loss = 0.39976704120635986
Validation loss = 0.40100330114364624
Validation loss = 0.41142189502716064
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.39590930938720703
Validation loss = 0.39322543144226074
Validation loss = 0.3951806426048279
Validation loss = 0.40082305669784546
Validation loss = 0.4102238118648529
Validation loss = 0.39713209867477417
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4059218764305115
Validation loss = 0.4073994755744934
Validation loss = 0.4036811292171478
Validation loss = 0.407164990901947
Validation loss = 0.4078654646873474
Validation loss = 0.41195914149284363
Validation loss = 0.4152297377586365
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.40340059995651245
Validation loss = 0.394875168800354
Validation loss = 0.41371893882751465
Validation loss = 0.39790770411491394
Validation loss = 0.4077610671520233
Validation loss = 0.4100807309150696
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.055   |
| Iteration     | 9        |
| MaximumReturn | -0.0191  |
| MinimumReturn | -0.144   |
| TotalSamples  | 18326    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41468948125839233
Validation loss = 0.4029920697212219
Validation loss = 0.39941248297691345
Validation loss = 0.4051371216773987
Validation loss = 0.4042767286300659
Validation loss = 0.40488192439079285
Validation loss = 0.4122515916824341
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.40202784538269043
Validation loss = 0.41336676478385925
Validation loss = 0.39833974838256836
Validation loss = 0.4093116223812103
Validation loss = 0.4051162302494049
Validation loss = 0.40415340662002563
Validation loss = 0.4035731554031372
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4015435576438904
Validation loss = 0.39429810643196106
Validation loss = 0.4050903618335724
Validation loss = 0.39740729331970215
Validation loss = 0.4003504514694214
Validation loss = 0.403097927570343
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.421247661113739
Validation loss = 0.409769743680954
Validation loss = 0.4112357795238495
Validation loss = 0.4116007685661316
Validation loss = 0.4118613004684448
Validation loss = 0.41565534472465515
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4101189970970154
Validation loss = 0.4039022922515869
Validation loss = 0.41135919094085693
Validation loss = 0.41073906421661377
Validation loss = 0.40202397108078003
Validation loss = 0.42289406061172485
Validation loss = 0.41311386227607727
Validation loss = 0.4161982834339142
Validation loss = 0.42212429642677307
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0441  |
| Iteration     | 10       |
| MaximumReturn | -0.0121  |
| MinimumReturn | -0.0939  |
| TotalSamples  | 19992    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.40504375100135803
Validation loss = 0.4082035422325134
Validation loss = 0.4089740216732025
Validation loss = 0.42001086473464966
Validation loss = 0.4102942943572998
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41192132234573364
Validation loss = 0.4009253978729248
Validation loss = 0.4045336842536926
Validation loss = 0.4043550491333008
Validation loss = 0.41262832283973694
Validation loss = 0.4072248041629791
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4049895703792572
Validation loss = 0.401202917098999
Validation loss = 0.409370481967926
Validation loss = 0.40986332297325134
Validation loss = 0.40930861234664917
Validation loss = 0.4039091467857361
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.41694918274879456
Validation loss = 0.4041409492492676
Validation loss = 0.41218656301498413
Validation loss = 0.4095744490623474
Validation loss = 0.41925865411758423
Validation loss = 0.42339426279067993
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.42540884017944336
Validation loss = 0.4080135226249695
Validation loss = 0.41106754541397095
Validation loss = 0.4098671078681946
Validation loss = 0.4123212397098541
Validation loss = 0.42217183113098145
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0704  |
| Iteration     | 11       |
| MaximumReturn | -0.022   |
| MinimumReturn | -0.273   |
| TotalSamples  | 21658    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4184994101524353
Validation loss = 0.4111486077308655
Validation loss = 0.42219066619873047
Validation loss = 0.4178764224052429
Validation loss = 0.42415651679039
Validation loss = 0.4300495982170105
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.413919061422348
Validation loss = 0.41007357835769653
Validation loss = 0.41408032178878784
Validation loss = 0.4132663309574127
Validation loss = 0.41360044479370117
Validation loss = 0.40927252173423767
Validation loss = 0.41440024971961975
Validation loss = 0.4176572263240814
Validation loss = 0.4133718013763428
Validation loss = 0.41538938879966736
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4111197590827942
Validation loss = 0.4094427525997162
Validation loss = 0.40683168172836304
Validation loss = 0.40915417671203613
Validation loss = 0.4066830575466156
Validation loss = 0.4113848805427551
Validation loss = 0.4118013381958008
Validation loss = 0.41657519340515137
Validation loss = 0.4191938042640686
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4296993315219879
Validation loss = 0.4169059693813324
Validation loss = 0.41987180709838867
Validation loss = 0.42150312662124634
Validation loss = 0.42145195603370667
Validation loss = 0.42560237646102905
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4143422544002533
Validation loss = 0.414217472076416
Validation loss = 0.4160442352294922
Validation loss = 0.4218687117099762
Validation loss = 0.42401066422462463
Validation loss = 0.41596731543540955
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.203   |
| Iteration     | 12       |
| MaximumReturn | -0.0283  |
| MinimumReturn | -2.55    |
| TotalSamples  | 23324    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.41060304641723633
Validation loss = 0.41685017943382263
Validation loss = 0.41562244296073914
Validation loss = 0.4196591079235077
Validation loss = 0.41969412565231323
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.41271474957466125
Validation loss = 0.41919127106666565
Validation loss = 0.41261836886405945
Validation loss = 0.41999396681785583
Validation loss = 0.4164777100086212
Validation loss = 0.42998963594436646
Validation loss = 0.42444342374801636
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4169180691242218
Validation loss = 0.40858134627342224
Validation loss = 0.41932395100593567
Validation loss = 0.4246358871459961
Validation loss = 0.4190848469734192
Validation loss = 0.4245145618915558
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4176185429096222
Validation loss = 0.42009779810905457
Validation loss = 0.4214133322238922
Validation loss = 0.42735087871551514
Validation loss = 0.4275974929332733
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4201854467391968
Validation loss = 0.41975975036621094
Validation loss = 0.42276880145072937
Validation loss = 0.42048466205596924
Validation loss = 0.427385151386261
Validation loss = 0.439208060503006
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0364  |
| Iteration     | 13       |
| MaximumReturn | -0.0133  |
| MinimumReturn | -0.0706  |
| TotalSamples  | 24990    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4259641170501709
Validation loss = 0.41762062907218933
Validation loss = 0.421281099319458
Validation loss = 0.4243110716342926
Validation loss = 0.4364447295665741
Validation loss = 0.4207099974155426
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.42488786578178406
Validation loss = 0.42852723598480225
Validation loss = 0.42414966225624084
Validation loss = 0.4240664541721344
Validation loss = 0.42862775921821594
Validation loss = 0.4318794906139374
Validation loss = 0.4395289719104767
Validation loss = 0.432992547750473
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.43033525347709656
Validation loss = 0.41594257950782776
Validation loss = 0.425480455160141
Validation loss = 0.42214229702949524
Validation loss = 0.42654359340667725
Validation loss = 0.427266925573349
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42630162835121155
Validation loss = 0.4243882894515991
Validation loss = 0.4250807762145996
Validation loss = 0.4342541992664337
Validation loss = 0.441664457321167
Validation loss = 0.4318004846572876
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.43204769492149353
Validation loss = 0.42348363995552063
Validation loss = 0.4267253875732422
Validation loss = 0.4379695653915405
Validation loss = 0.4325810372829437
Validation loss = 0.4319041073322296
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0493  |
| Iteration     | 14       |
| MaximumReturn | -0.0128  |
| MinimumReturn | -0.141   |
| TotalSamples  | 26656    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.42353391647338867
Validation loss = 0.4212050437927246
Validation loss = 0.4235948920249939
Validation loss = 0.4200280010700226
Validation loss = 0.4279281795024872
Validation loss = 0.42424848675727844
Validation loss = 0.43684619665145874
Validation loss = 0.42908427119255066
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4347043037414551
Validation loss = 0.4255576431751251
Validation loss = 0.4223215878009796
Validation loss = 0.43212276697158813
Validation loss = 0.42724549770355225
Validation loss = 0.4361101984977722
Validation loss = 0.4343935251235962
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.42496564984321594
Validation loss = 0.42811644077301025
Validation loss = 0.42118698358535767
Validation loss = 0.41894835233688354
Validation loss = 0.4213644564151764
Validation loss = 0.4286760687828064
Validation loss = 0.42707589268684387
Validation loss = 0.429303914308548
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42495205998420715
Validation loss = 0.4245830774307251
Validation loss = 0.4289402961730957
Validation loss = 0.4261125326156616
Validation loss = 0.42943239212036133
Validation loss = 0.4368724226951599
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4280250370502472
Validation loss = 0.42883938550949097
Validation loss = 0.431123286485672
Validation loss = 0.4307926893234253
Validation loss = 0.4305077791213989
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -6.45    |
| Iteration     | 15       |
| MaximumReturn | -0.0353  |
| MinimumReturn | -52.5    |
| TotalSamples  | 28322    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4321292042732239
Validation loss = 0.44463130831718445
Validation loss = 0.4302826523780823
Validation loss = 0.4382917881011963
Validation loss = 0.4331877529621124
Validation loss = 0.4412647783756256
Validation loss = 0.4424827992916107
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4471360146999359
Validation loss = 0.43819522857666016
Validation loss = 0.43603450059890747
Validation loss = 0.43401357531547546
Validation loss = 0.4397963881492615
Validation loss = 0.4453010857105255
Validation loss = 0.44233939051628113
Validation loss = 0.44309279322624207
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4371398091316223
Validation loss = 0.4281825125217438
Validation loss = 0.4307878017425537
Validation loss = 0.43218994140625
Validation loss = 0.4358833432197571
Validation loss = 0.44311970472335815
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.43380841612815857
Validation loss = 0.4374983310699463
Validation loss = 0.4339168071746826
Validation loss = 0.4380432665348053
Validation loss = 0.443167507648468
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.44181230664253235
Validation loss = 0.433905690908432
Validation loss = 0.4347761273384094
Validation loss = 0.4374995529651642
Validation loss = 0.4369531571865082
Validation loss = 0.44359731674194336
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.05    |
| Iteration     | 16       |
| MaximumReturn | -0.0273  |
| MinimumReturn | -24.7    |
| TotalSamples  | 29988    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4329766035079956
Validation loss = 0.4448021352291107
Validation loss = 0.44276168942451477
Validation loss = 0.4370441436767578
Validation loss = 0.44636788964271545
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4384266436100006
Validation loss = 0.44533678889274597
Validation loss = 0.4495837986469269
Validation loss = 0.45200201869010925
Validation loss = 0.44830521941185
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4349580705165863
Validation loss = 0.4369274079799652
Validation loss = 0.43865156173706055
Validation loss = 0.4368014633655548
Validation loss = 0.43844738602638245
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.43654966354370117
Validation loss = 0.4347148835659027
Validation loss = 0.4356629550457001
Validation loss = 0.4408889412879944
Validation loss = 0.4374924302101135
Validation loss = 0.4413236081600189
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4359612762928009
Validation loss = 0.43877026438713074
Validation loss = 0.43947240710258484
Validation loss = 0.4466840326786041
Validation loss = 0.45518559217453003
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.8    |
| Iteration     | 17       |
| MaximumReturn | -0.0277  |
| MinimumReturn | -52.7    |
| TotalSamples  | 31654    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4425869882106781
Validation loss = 0.4424833655357361
Validation loss = 0.45193928480148315
Validation loss = 0.44306737184524536
Validation loss = 0.4477843642234802
Validation loss = 0.4491482079029083
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.44449329376220703
Validation loss = 0.4468039870262146
Validation loss = 0.450126975774765
Validation loss = 0.4509096145629883
Validation loss = 0.4591550827026367
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.44353288412094116
Validation loss = 0.43253645300865173
Validation loss = 0.43986883759498596
Validation loss = 0.4405056834220886
Validation loss = 0.4482311010360718
Validation loss = 0.4472993314266205
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4369240403175354
Validation loss = 0.4466840624809265
Validation loss = 0.4425690472126007
Validation loss = 0.44213423132896423
Validation loss = 0.44475919008255005
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.43700721859931946
Validation loss = 0.4381975531578064
Validation loss = 0.4438548982143402
Validation loss = 0.446501225233078
Validation loss = 0.443765252828598
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.0795  |
| Iteration     | 18       |
| MaximumReturn | -0.0373  |
| MinimumReturn | -0.143   |
| TotalSamples  | 33320    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.44847145676612854
Validation loss = 0.45138394832611084
Validation loss = 0.45279747247695923
Validation loss = 0.4490819573402405
Validation loss = 0.4566359221935272
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.44840699434280396
Validation loss = 0.4545660614967346
Validation loss = 0.4526926875114441
Validation loss = 0.4561804234981537
Validation loss = 0.45485782623291016
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.444458544254303
Validation loss = 0.4451885521411896
Validation loss = 0.44645053148269653
Validation loss = 0.4459701180458069
Validation loss = 0.45792943239212036
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.45376843214035034
Validation loss = 0.4421811103820801
Validation loss = 0.44296789169311523
Validation loss = 0.44572770595550537
Validation loss = 0.45882463455200195
Validation loss = 0.4509161114692688
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4491844177246094
Validation loss = 0.4477214813232422
Validation loss = 0.44283002614974976
Validation loss = 0.45199617743492126
Validation loss = 0.4559492766857147
Validation loss = 0.4555853307247162
Validation loss = 0.46402186155319214
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -7.51    |
| Iteration     | 19       |
| MaximumReturn | -0.0241  |
| MinimumReturn | -44.6    |
| TotalSamples  | 34986    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4555668532848358
Validation loss = 0.4528256058692932
Validation loss = 0.4588526785373688
Validation loss = 0.4524066746234894
Validation loss = 0.45927244424819946
Validation loss = 0.4642946422100067
Validation loss = 0.4627467691898346
Validation loss = 0.474833607673645
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4551382064819336
Validation loss = 0.45820358395576477
Validation loss = 0.4566095471382141
Validation loss = 0.45997655391693115
Validation loss = 0.4587888717651367
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4442122280597687
Validation loss = 0.4436846077442169
Validation loss = 0.44776052236557007
Validation loss = 0.45325809717178345
Validation loss = 0.4519432485103607
Validation loss = 0.4547322988510132
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.45364150404930115
Validation loss = 0.44852712750434875
Validation loss = 0.4482753276824951
Validation loss = 0.4549311697483063
Validation loss = 0.4607637822628021
Validation loss = 0.4655759334564209
Validation loss = 0.45695507526397705
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4499189853668213
Validation loss = 0.45199915766716003
Validation loss = 0.4550784230232239
Validation loss = 0.4613501727581024
Validation loss = 0.4564054310321808
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -65.3    |
| Iteration     | 20       |
| MaximumReturn | -23      |
| MinimumReturn | -105     |
| TotalSamples  | 36652    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.45667901635169983
Validation loss = 0.46780693531036377
Validation loss = 0.47075048089027405
Validation loss = 0.46398448944091797
Validation loss = 0.4697662591934204
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4424971342086792
Validation loss = 0.45814386010169983
Validation loss = 0.4636520743370056
Validation loss = 0.4676855504512787
Validation loss = 0.46191009879112244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4460185468196869
Validation loss = 0.44854456186294556
Validation loss = 0.45324671268463135
Validation loss = 0.45698150992393494
Validation loss = 0.4600604474544525
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4428598880767822
Validation loss = 0.45147669315338135
Validation loss = 0.45953479409217834
Validation loss = 0.4590199887752533
Validation loss = 0.45997539162635803
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.45006290078163147
Validation loss = 0.4615863561630249
Validation loss = 0.4595984220504761
Validation loss = 0.46086835861206055
Validation loss = 0.4636918306350708
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36.6    |
| Iteration     | 21       |
| MaximumReturn | -0.0294  |
| MinimumReturn | -113     |
| TotalSamples  | 38318    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.47239619493484497
Validation loss = 0.46216049790382385
Validation loss = 0.4657509922981262
Validation loss = 0.4766950011253357
Validation loss = 0.47136372327804565
Validation loss = 0.47857341170310974
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.45186468958854675
Validation loss = 0.4548851251602173
Validation loss = 0.4626579284667969
Validation loss = 0.45936620235443115
Validation loss = 0.4748315215110779
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4443643391132355
Validation loss = 0.4571765959262848
Validation loss = 0.4506756663322449
Validation loss = 0.4677208364009857
Validation loss = 0.4617127776145935
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4636255204677582
Validation loss = 0.46197810769081116
Validation loss = 0.45846423506736755
Validation loss = 0.4571053981781006
Validation loss = 0.45902326703071594
Validation loss = 0.4659537374973297
Validation loss = 0.46782565116882324
Validation loss = 0.4733273386955261
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4568507671356201
Validation loss = 0.45962217450141907
Validation loss = 0.456134557723999
Validation loss = 0.46241870522499084
Validation loss = 0.4645273685455322
Validation loss = 0.46603670716285706
Validation loss = 0.46977901458740234
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -40      |
| Iteration     | 22       |
| MaximumReturn | -0.0493  |
| MinimumReturn | -104     |
| TotalSamples  | 39984    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.46595683693885803
Validation loss = 0.47436922788619995
Validation loss = 0.47339096665382385
Validation loss = 0.4711977541446686
Validation loss = 0.47624748945236206
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4567328095436096
Validation loss = 0.4527309536933899
Validation loss = 0.46343570947647095
Validation loss = 0.46157726645469666
Validation loss = 0.46020370721817017
Validation loss = 0.47183212637901306
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4599359929561615
Validation loss = 0.4512191414833069
Validation loss = 0.4589354991912842
Validation loss = 0.4622820317745209
Validation loss = 0.46136313676834106
Validation loss = 0.4617766737937927
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46465834975242615
Validation loss = 0.47172898054122925
Validation loss = 0.4652880132198334
Validation loss = 0.4694003462791443
Validation loss = 0.4710913598537445
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.46094614267349243
Validation loss = 0.4704872667789459
Validation loss = 0.46045055985450745
Validation loss = 0.46543702483177185
Validation loss = 0.4728890061378479
Validation loss = 0.47056084871292114
Validation loss = 0.47000232338905334
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.38    |
| Iteration     | 23       |
| MaximumReturn | -0.0185  |
| MinimumReturn | -25.2    |
| TotalSamples  | 41650    |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.47597748041152954
Validation loss = 0.47728675603866577
Validation loss = 0.48902249336242676
Validation loss = 0.48481082916259766
Validation loss = 0.4830528795719147
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46956077218055725
Validation loss = 0.46798402070999146
Validation loss = 0.47343024611473083
Validation loss = 0.47739219665527344
Validation loss = 0.4716361165046692
Validation loss = 0.47868332266807556
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4654722213745117
Validation loss = 0.46562933921813965
Validation loss = 0.46742233633995056
Validation loss = 0.4661657214164734
Validation loss = 0.47227057814598083
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.46948638558387756
Validation loss = 0.4707246422767639
Validation loss = 0.47608980536460876
Validation loss = 0.47166481614112854
Validation loss = 0.47794684767723083
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4670049548149109
Validation loss = 0.4732579290866852
Validation loss = 0.47758740186691284
Validation loss = 0.48052582144737244
Validation loss = 0.47957196831703186
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -11.5    |
| Iteration     | 24       |
| MaximumReturn | -0.0304  |
| MinimumReturn | -58.3    |
| TotalSamples  | 43316    |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4682994484901428
Validation loss = 0.47032061219215393
Validation loss = 0.47665953636169434
Validation loss = 0.4761229455471039
Validation loss = 0.4834960401058197
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4659615457057953
Validation loss = 0.46893537044525146
Validation loss = 0.4718516767024994
Validation loss = 0.47326990962028503
Validation loss = 0.4694252908229828
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4612730145454407
Validation loss = 0.4689985513687134
Validation loss = 0.47028592228889465
Validation loss = 0.4674583971500397
Validation loss = 0.4845994710922241
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.47188350558280945
Validation loss = 0.4670603573322296
Validation loss = 0.4777821898460388
Validation loss = 0.47123581171035767
Validation loss = 0.47346261143684387
Validation loss = 0.4789947271347046
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4749438464641571
Validation loss = 0.4676850736141205
Validation loss = 0.4685072600841522
Validation loss = 0.47653573751449585
Validation loss = 0.481088787317276
Validation loss = 0.48344144225120544
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.25    |
| Iteration     | 25       |
| MaximumReturn | -0.0312  |
| MinimumReturn | -47.8    |
| TotalSamples  | 44982    |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.47889187932014465
Validation loss = 0.48474571108818054
Validation loss = 0.49668776988983154
Validation loss = 0.4880061447620392
Validation loss = 0.48460522294044495
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.46924278140068054
Validation loss = 0.4656582176685333
Validation loss = 0.4756315052509308
Validation loss = 0.4739285409450531
Validation loss = 0.4854639172554016
Validation loss = 0.4876112937927246
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.46601104736328125
Validation loss = 0.4649653136730194
Validation loss = 0.4670100212097168
Validation loss = 0.47232216596603394
Validation loss = 0.4782074987888336
Validation loss = 0.47947320342063904
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4736633002758026
Validation loss = 0.47887247800827026
Validation loss = 0.4785737097263336
Validation loss = 0.48457834124565125
Validation loss = 0.4927101135253906
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4766843616962433
Validation loss = 0.47071152925491333
Validation loss = 0.4809519648551941
Validation loss = 0.48122450709342957
Validation loss = 0.4896713197231293
Validation loss = 0.4801943004131317
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.226   |
| Iteration     | 26       |
| MaximumReturn | -0.106   |
| MinimumReturn | -0.494   |
| TotalSamples  | 46648    |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4789397120475769
Validation loss = 0.48377376794815063
Validation loss = 0.4836742579936981
Validation loss = 0.5018706917762756
Validation loss = 0.49527648091316223
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4765171706676483
Validation loss = 0.473588228225708
Validation loss = 0.4809664487838745
Validation loss = 0.48749372363090515
Validation loss = 0.485039621591568
Validation loss = 0.4812227785587311
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.46518751978874207
Validation loss = 0.4733333885669708
Validation loss = 0.4729359745979309
Validation loss = 0.4819349944591522
Validation loss = 0.49078652262687683
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4793556332588196
Validation loss = 0.4780471920967102
Validation loss = 0.47757989168167114
Validation loss = 0.4779636561870575
Validation loss = 0.48172736167907715
Validation loss = 0.48939642310142517
Validation loss = 0.4881778061389923
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4812508821487427
Validation loss = 0.47816798090934753
Validation loss = 0.4806439280509949
Validation loss = 0.48903635144233704
Validation loss = 0.4933343231678009
Validation loss = 0.4879423677921295
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.211   |
| Iteration     | 27       |
| MaximumReturn | -0.0811  |
| MinimumReturn | -0.421   |
| TotalSamples  | 48314    |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49200204014778137
Validation loss = 0.49253353476524353
Validation loss = 0.49040961265563965
Validation loss = 0.5082400441169739
Validation loss = 0.5089669227600098
Validation loss = 0.49751922488212585
Validation loss = 0.5052412152290344
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4812045097351074
Validation loss = 0.48931002616882324
Validation loss = 0.4878096878528595
Validation loss = 0.49667754769325256
Validation loss = 0.5005479454994202
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.47221311926841736
Validation loss = 0.47786617279052734
Validation loss = 0.48164209723472595
Validation loss = 0.48005637526512146
Validation loss = 0.4850940704345703
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49161458015441895
Validation loss = 0.48993119597435
Validation loss = 0.49419164657592773
Validation loss = 0.4927093982696533
Validation loss = 0.4992993175983429
Validation loss = 0.4969543218612671
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49324166774749756
Validation loss = 0.49223241209983826
Validation loss = 0.48922523856163025
Validation loss = 0.4948174059391022
Validation loss = 0.48979616165161133
Validation loss = 0.4948979616165161
Validation loss = 0.4992412030696869
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.21    |
| Iteration     | 28       |
| MaximumReturn | -0.0924  |
| MinimumReturn | -47.5    |
| TotalSamples  | 49980    |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4956878423690796
Validation loss = 0.5008763074874878
Validation loss = 0.5016113519668579
Validation loss = 0.5014446377754211
Validation loss = 0.5032358765602112
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4887920320034027
Validation loss = 0.4825173318386078
Validation loss = 0.487082839012146
Validation loss = 0.49222832918167114
Validation loss = 0.4950959086418152
Validation loss = 0.505885899066925
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4794694185256958
Validation loss = 0.48082926869392395
Validation loss = 0.4863297939300537
Validation loss = 0.4888361990451813
Validation loss = 0.49300992488861084
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4885840117931366
Validation loss = 0.4969896972179413
Validation loss = 0.49449992179870605
Validation loss = 0.49669378995895386
Validation loss = 0.5008624792098999
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.48990336060523987
Validation loss = 0.49606963992118835
Validation loss = 0.4955340325832367
Validation loss = 0.4992525577545166
Validation loss = 0.5088295936584473
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.123   |
| Iteration     | 29       |
| MaximumReturn | -0.0672  |
| MinimumReturn | -0.213   |
| TotalSamples  | 51646    |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5077775716781616
Validation loss = 0.5173789858818054
Validation loss = 0.5084341168403625
Validation loss = 0.509065568447113
Validation loss = 0.5112283229827881
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49522632360458374
Validation loss = 0.49150899052619934
Validation loss = 0.5014172792434692
Validation loss = 0.5014499425888062
Validation loss = 0.5055136680603027
Validation loss = 0.5016478896141052
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4962611794471741
Validation loss = 0.48937004804611206
Validation loss = 0.4909641742706299
Validation loss = 0.5093863010406494
Validation loss = 0.5016235113143921
Validation loss = 0.49836966395378113
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49967876076698303
Validation loss = 0.49391019344329834
Validation loss = 0.4950656592845917
Validation loss = 0.4980699419975281
Validation loss = 0.5071184039115906
Validation loss = 0.505211353302002
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4955305755138397
Validation loss = 0.507393479347229
Validation loss = 0.49937379360198975
Validation loss = 0.5011554956436157
Validation loss = 0.5049698352813721
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.298   |
| Iteration     | 30       |
| MaximumReturn | -0.182   |
| MinimumReturn | -0.572   |
| TotalSamples  | 53312    |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5090265870094299
Validation loss = 0.5061441659927368
Validation loss = 0.5067642331123352
Validation loss = 0.5112819671630859
Validation loss = 0.5133790969848633
Validation loss = 0.521115243434906
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5000733733177185
Validation loss = 0.49766474962234497
Validation loss = 0.5067833662033081
Validation loss = 0.5068306922912598
Validation loss = 0.5079691410064697
Validation loss = 0.5084248781204224
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4886583089828491
Validation loss = 0.507710337638855
Validation loss = 0.4982750117778778
Validation loss = 0.4949495494365692
Validation loss = 0.4991506040096283
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4967428743839264
Validation loss = 0.5086544752120972
Validation loss = 0.5030249357223511
Validation loss = 0.5109504461288452
Validation loss = 0.514923095703125
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5032944083213806
Validation loss = 0.5029119253158569
Validation loss = 0.4991453289985657
Validation loss = 0.5023103952407837
Validation loss = 0.5114331841468811
Validation loss = 0.5137183666229248
Validation loss = 0.5208393931388855
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18.5    |
| Iteration     | 31       |
| MaximumReturn | -0.0408  |
| MinimumReturn | -77.1    |
| TotalSamples  | 54978    |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.508887529373169
Validation loss = 0.5111661553382874
Validation loss = 0.5127240419387817
Validation loss = 0.515709638595581
Validation loss = 0.5234619379043579
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4960439205169678
Validation loss = 0.49846556782722473
Validation loss = 0.5116738677024841
Validation loss = 0.5110196471214294
Validation loss = 0.5084649324417114
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4932626187801361
Validation loss = 0.5025194883346558
Validation loss = 0.4984114170074463
Validation loss = 0.5084531307220459
Validation loss = 0.5101048946380615
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4990275800228119
Validation loss = 0.5042431354522705
Validation loss = 0.5109632611274719
Validation loss = 0.505359947681427
Validation loss = 0.5130749940872192
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5059444308280945
Validation loss = 0.5062066316604614
Validation loss = 0.5073544383049011
Validation loss = 0.5122962594032288
Validation loss = 0.5171128511428833
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -83.6    |
| Iteration     | 32       |
| MaximumReturn | -0.1     |
| MinimumReturn | -138     |
| TotalSamples  | 56644    |
----------------------------
itr #33 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5039424896240234
Validation loss = 0.50871342420578
Validation loss = 0.5125325918197632
Validation loss = 0.5167123079299927
Validation loss = 0.5160015225410461
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5027146339416504
Validation loss = 0.494319885969162
Validation loss = 0.5100656747817993
Validation loss = 0.5126628279685974
Validation loss = 0.5154944062232971
Validation loss = 0.5119827389717102
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4975285828113556
Validation loss = 0.4931902587413788
Validation loss = 0.5064789652824402
Validation loss = 0.5062803626060486
Validation loss = 0.5164317488670349
Validation loss = 0.5107840895652771
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49655070900917053
Validation loss = 0.5027323961257935
Validation loss = 0.508674681186676
Validation loss = 0.5064885020256042
Validation loss = 0.5064825415611267
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5017693638801575
Validation loss = 0.5064662098884583
Validation loss = 0.5138818621635437
Validation loss = 0.515342652797699
Validation loss = 0.5210426449775696
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -83.1    |
| Iteration     | 33       |
| MaximumReturn | -0.237   |
| MinimumReturn | -128     |
| TotalSamples  | 58310    |
----------------------------
itr #34 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5009946227073669
Validation loss = 0.5074137449264526
Validation loss = 0.5120455622673035
Validation loss = 0.5149970054626465
Validation loss = 0.5200755596160889
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49134361743927
Validation loss = 0.5034652948379517
Validation loss = 0.5069564580917358
Validation loss = 0.506899356842041
Validation loss = 0.5185409188270569
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4903474450111389
Validation loss = 0.4999989867210388
Validation loss = 0.5044155120849609
Validation loss = 0.5045728087425232
Validation loss = 0.5072747468948364
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4916350543498993
Validation loss = 0.499069482088089
Validation loss = 0.5176401734352112
Validation loss = 0.5070337057113647
Validation loss = 0.510905385017395
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4990144670009613
Validation loss = 0.5011301040649414
Validation loss = 0.5075904130935669
Validation loss = 0.5134497284889221
Validation loss = 0.5173802375793457
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -93.5    |
| Iteration     | 34       |
| MaximumReturn | -1.17    |
| MinimumReturn | -145     |
| TotalSamples  | 59976    |
----------------------------
itr #35 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4976319968700409
Validation loss = 0.5100722312927246
Validation loss = 0.5148572325706482
Validation loss = 0.5122514367103577
Validation loss = 0.5174585580825806
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4899376928806305
Validation loss = 0.49612492322921753
Validation loss = 0.5026065707206726
Validation loss = 0.5083045959472656
Validation loss = 0.5111879706382751
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4842391610145569
Validation loss = 0.4966195225715637
Validation loss = 0.49971550703048706
Validation loss = 0.5046118497848511
Validation loss = 0.5046695470809937
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4834173619747162
Validation loss = 0.4967975914478302
Validation loss = 0.5106796622276306
Validation loss = 0.5090880990028381
Validation loss = 0.5112542510032654
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49178799986839294
Validation loss = 0.5041530132293701
Validation loss = 0.5051853656768799
Validation loss = 0.5074738264083862
Validation loss = 0.5181877613067627
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -86.9    |
| Iteration     | 35       |
| MaximumReturn | -1.38    |
| MinimumReturn | -158     |
| TotalSamples  | 61642    |
----------------------------
itr #36 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4954194128513336
Validation loss = 0.5122261643409729
Validation loss = 0.5114604234695435
Validation loss = 0.5171210765838623
Validation loss = 0.5157989859580994
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4861983060836792
Validation loss = 0.4984394907951355
Validation loss = 0.509361982345581
Validation loss = 0.5108247995376587
Validation loss = 0.5053622722625732
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48809316754341125
Validation loss = 0.4972692131996155
Validation loss = 0.4996221661567688
Validation loss = 0.5022585391998291
Validation loss = 0.5012766718864441
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.487252801656723
Validation loss = 0.4985498785972595
Validation loss = 0.507987380027771
Validation loss = 0.5114346742630005
Validation loss = 0.5093850493431091
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4886907637119293
Validation loss = 0.5066030621528625
Validation loss = 0.5079684853553772
Validation loss = 0.5100929141044617
Validation loss = 0.5112642049789429
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -95.8    |
| Iteration     | 36       |
| MaximumReturn | -1.56    |
| MinimumReturn | -171     |
| TotalSamples  | 63308    |
----------------------------
itr #37 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4968659281730652
Validation loss = 0.5082458257675171
Validation loss = 0.5120000243186951
Validation loss = 0.5111353397369385
Validation loss = 0.5132347941398621
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49098366498947144
Validation loss = 0.501933217048645
Validation loss = 0.502273678779602
Validation loss = 0.5051380395889282
Validation loss = 0.5077440738677979
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4900992810726166
Validation loss = 0.4974481761455536
Validation loss = 0.5064073204994202
Validation loss = 0.5060771703720093
Validation loss = 0.5202001333236694
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49464043974876404
Validation loss = 0.49902063608169556
Validation loss = 0.5048363208770752
Validation loss = 0.5038782358169556
Validation loss = 0.5043628811836243
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4946596324443817
Validation loss = 0.5034353733062744
Validation loss = 0.5005584359169006
Validation loss = 0.5082650184631348
Validation loss = 0.5123559236526489
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -88      |
| Iteration     | 37       |
| MaximumReturn | -47.5    |
| MinimumReturn | -119     |
| TotalSamples  | 64974    |
----------------------------
itr #38 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49938729405403137
Validation loss = 0.5059781670570374
Validation loss = 0.5101604461669922
Validation loss = 0.5119940042495728
Validation loss = 0.5196628570556641
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4918190836906433
Validation loss = 0.5040472149848938
Validation loss = 0.5093398094177246
Validation loss = 0.5028257369995117
Validation loss = 0.5051455497741699
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49032747745513916
Validation loss = 0.49889129400253296
Validation loss = 0.5005893707275391
Validation loss = 0.5120882987976074
Validation loss = 0.5044654607772827
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.48915547132492065
Validation loss = 0.5057777166366577
Validation loss = 0.5065438747406006
Validation loss = 0.5054238438606262
Validation loss = 0.5144953727722168
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49693259596824646
Validation loss = 0.5034443736076355
Validation loss = 0.506098747253418
Validation loss = 0.5032556056976318
Validation loss = 0.5102803707122803
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.7    |
| Iteration     | 38       |
| MaximumReturn | -1.3     |
| MinimumReturn | -79.1    |
| TotalSamples  | 66640    |
----------------------------
itr #39 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5068953037261963
Validation loss = 0.5042445063591003
Validation loss = 0.5136587023735046
Validation loss = 0.510672926902771
Validation loss = 0.5112515091896057
Validation loss = 0.5160055160522461
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4972476363182068
Validation loss = 0.49583348631858826
Validation loss = 0.5022857785224915
Validation loss = 0.5094881653785706
Validation loss = 0.5070621967315674
Validation loss = 0.5080095529556274
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49580511450767517
Validation loss = 0.5016006827354431
Validation loss = 0.5013132095336914
Validation loss = 0.5041932463645935
Validation loss = 0.5039220452308655
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4941936731338501
Validation loss = 0.5043624639511108
Validation loss = 0.5011976361274719
Validation loss = 0.5025734305381775
Validation loss = 0.505186915397644
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49931469559669495
Validation loss = 0.502807080745697
Validation loss = 0.5083575248718262
Validation loss = 0.509100615978241
Validation loss = 0.5087013244628906
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.03    |
| Iteration     | 39       |
| MaximumReturn | -4.62    |
| MinimumReturn | -20.2    |
| TotalSamples  | 68306    |
----------------------------
itr #40 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5073150992393494
Validation loss = 0.512114405632019
Validation loss = 0.510417640209198
Validation loss = 0.5139449238777161
Validation loss = 0.5202338099479675
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5012194514274597
Validation loss = 0.5063326954841614
Validation loss = 0.5260673761367798
Validation loss = 0.5070109963417053
Validation loss = 0.5098372101783752
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4988180696964264
Validation loss = 0.5000365376472473
Validation loss = 0.5050454139709473
Validation loss = 0.5081157088279724
Validation loss = 0.5084230303764343
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49747034907341003
Validation loss = 0.5038132667541504
Validation loss = 0.5044728517532349
Validation loss = 0.504694402217865
Validation loss = 0.5062540769577026
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5015782117843628
Validation loss = 0.5146951079368591
Validation loss = 0.5076403021812439
Validation loss = 0.5077430605888367
Validation loss = 0.5175163745880127
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.04    |
| Iteration     | 40       |
| MaximumReturn | -0.338   |
| MinimumReturn | -5.84    |
| TotalSamples  | 69972    |
----------------------------
itr #41 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5232282876968384
Validation loss = 0.5159481167793274
Validation loss = 0.516421377658844
Validation loss = 0.520443856716156
Validation loss = 0.5263881087303162
Validation loss = 0.5240907669067383
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5063111782073975
Validation loss = 0.5096675753593445
Validation loss = 0.5131415128707886
Validation loss = 0.5183275938034058
Validation loss = 0.5135162472724915
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5097025036811829
Validation loss = 0.5045775175094604
Validation loss = 0.5052340626716614
Validation loss = 0.5079483389854431
Validation loss = 0.5137155055999756
Validation loss = 0.5146551728248596
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5046803951263428
Validation loss = 0.5068801045417786
Validation loss = 0.5091658234596252
Validation loss = 0.5090720653533936
Validation loss = 0.5118629932403564
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5155444741249084
Validation loss = 0.5117637515068054
Validation loss = 0.5137945413589478
Validation loss = 0.5145583748817444
Validation loss = 0.5175987482070923
Validation loss = 0.5226768851280212
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -72.6    |
| Iteration     | 41       |
| MaximumReturn | -2.34    |
| MinimumReturn | -141     |
| TotalSamples  | 71638    |
----------------------------
itr #42 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.48557543754577637
Validation loss = 0.4950047731399536
Validation loss = 0.5047033429145813
Validation loss = 0.5085172653198242
Validation loss = 0.5103746056556702
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4748486578464508
Validation loss = 0.4906081557273865
Validation loss = 0.49902763962745667
Validation loss = 0.5001276731491089
Validation loss = 0.5120665431022644
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4780581295490265
Validation loss = 0.49524611234664917
Validation loss = 0.4968990981578827
Validation loss = 0.500445544719696
Validation loss = 0.5021486282348633
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.48525166511535645
Validation loss = 0.4908999800682068
Validation loss = 0.49538901448249817
Validation loss = 0.495139479637146
Validation loss = 0.5020799040794373
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.48279768228530884
Validation loss = 0.4911085367202759
Validation loss = 0.49472716450691223
Validation loss = 0.4993912875652313
Validation loss = 0.5026919841766357
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -29.1    |
| Iteration     | 42       |
| MaximumReturn | -1.37    |
| MinimumReturn | -96      |
| TotalSamples  | 73304    |
----------------------------
itr #43 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49414750933647156
Validation loss = 0.4988466799259186
Validation loss = 0.4988056421279907
Validation loss = 0.49944421648979187
Validation loss = 0.5006166100502014
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4877484142780304
Validation loss = 0.48667341470718384
Validation loss = 0.49173504114151
Validation loss = 0.4985925853252411
Validation loss = 0.4930976331233978
Validation loss = 0.4991910755634308
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48999494314193726
Validation loss = 0.4894566535949707
Validation loss = 0.4948032796382904
Validation loss = 0.4936998188495636
Validation loss = 0.4953788220882416
Validation loss = 0.49793094396591187
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4908907115459442
Validation loss = 0.4875815808773041
Validation loss = 0.48939248919487
Validation loss = 0.48810869455337524
Validation loss = 0.4877191185951233
Validation loss = 0.49221161007881165
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49210765957832336
Validation loss = 0.4925853908061981
Validation loss = 0.4940115511417389
Validation loss = 0.4968564808368683
Validation loss = 0.49925756454467773
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26      |
| Iteration     | 43       |
| MaximumReturn | -0.47    |
| MinimumReturn | -78.9    |
| TotalSamples  | 74970    |
----------------------------
itr #44 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49628958106040955
Validation loss = 0.49704402685165405
Validation loss = 0.4985049068927765
Validation loss = 0.5050008893013
Validation loss = 0.5082616806030273
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4899080693721771
Validation loss = 0.48748165369033813
Validation loss = 0.4878496825695038
Validation loss = 0.49214285612106323
Validation loss = 0.4891211986541748
Validation loss = 0.49168798327445984
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.48960185050964355
Validation loss = 0.49514856934547424
Validation loss = 0.49656444787979126
Validation loss = 0.48705461621284485
Validation loss = 0.49191001057624817
Validation loss = 0.49892106652259827
Validation loss = 0.5018537044525146
Validation loss = 0.4989491105079651
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4900191128253937
Validation loss = 0.48285627365112305
Validation loss = 0.4879414439201355
Validation loss = 0.49420166015625
Validation loss = 0.4969397485256195
Validation loss = 0.49336978793144226
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4974423348903656
Validation loss = 0.49525704979896545
Validation loss = 0.4909207224845886
Validation loss = 0.49321237206459045
Validation loss = 0.4974839389324188
Validation loss = 0.4959633946418762
Validation loss = 0.49485665559768677
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -22.2    |
| Iteration     | 44       |
| MaximumReturn | -0.951   |
| MinimumReturn | -80.1    |
| TotalSamples  | 76636    |
----------------------------
itr #45 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4976018965244293
Validation loss = 0.5019241571426392
Validation loss = 0.5017936825752258
Validation loss = 0.5057231187820435
Validation loss = 0.5033087134361267
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49648937582969666
Validation loss = 0.4988713562488556
Validation loss = 0.4931713342666626
Validation loss = 0.5031828880310059
Validation loss = 0.5037237405776978
Validation loss = 0.4995734691619873
Validation loss = 0.4998871684074402
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4957757890224457
Validation loss = 0.4948195815086365
Validation loss = 0.5027963519096375
Validation loss = 0.4962611496448517
Validation loss = 0.5024640560150146
Validation loss = 0.5097270607948303
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4875676929950714
Validation loss = 0.4918591380119324
Validation loss = 0.49372488260269165
Validation loss = 0.49772271513938904
Validation loss = 0.4930534362792969
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4982132613658905
Validation loss = 0.4959409534931183
Validation loss = 0.5002840757369995
Validation loss = 0.5028459429740906
Validation loss = 0.5048525333404541
Validation loss = 0.5009918212890625
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -68.3    |
| Iteration     | 45       |
| MaximumReturn | -2.92    |
| MinimumReturn | -138     |
| TotalSamples  | 78302    |
----------------------------
itr #46 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49948906898498535
Validation loss = 0.5046520233154297
Validation loss = 0.5011675357818604
Validation loss = 0.5083761811256409
Validation loss = 0.5121767520904541
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4977790117263794
Validation loss = 0.4969950020313263
Validation loss = 0.49760901927948
Validation loss = 0.5022996068000793
Validation loss = 0.5067092776298523
Validation loss = 0.505570113658905
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5062184929847717
Validation loss = 0.49874788522720337
Validation loss = 0.5038870573043823
Validation loss = 0.5041826367378235
Validation loss = 0.5077549815177917
Validation loss = 0.5027931332588196
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4926317632198334
Validation loss = 0.4901774525642395
Validation loss = 0.5002458095550537
Validation loss = 0.4941287040710449
Validation loss = 0.4985333979129791
Validation loss = 0.4994671642780304
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4990852177143097
Validation loss = 0.5058270692825317
Validation loss = 0.5031929016113281
Validation loss = 0.5032665729522705
Validation loss = 0.5053905844688416
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -49      |
| Iteration     | 46       |
| MaximumReturn | -0.228   |
| MinimumReturn | -118     |
| TotalSamples  | 79968    |
----------------------------
itr #47 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49361491203308105
Validation loss = 0.4963518977165222
Validation loss = 0.491769403219223
Validation loss = 0.5013018846511841
Validation loss = 0.5041427612304688
Validation loss = 0.5096826553344727
Validation loss = 0.5029693841934204
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49217191338539124
Validation loss = 0.49798840284347534
Validation loss = 0.4979611337184906
Validation loss = 0.4952431619167328
Validation loss = 0.4992418885231018
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4932614266872406
Validation loss = 0.49652203917503357
Validation loss = 0.4969121515750885
Validation loss = 0.49913302063941956
Validation loss = 0.4988635182380676
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.48869791626930237
Validation loss = 0.4889202117919922
Validation loss = 0.49043044447898865
Validation loss = 0.4925224184989929
Validation loss = 0.4930216670036316
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.493154376745224
Validation loss = 0.4935407042503357
Validation loss = 0.49516376852989197
Validation loss = 0.4986291527748108
Validation loss = 0.5040104985237122
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -69.2    |
| Iteration     | 47       |
| MaximumReturn | -0.372   |
| MinimumReturn | -128     |
| TotalSamples  | 81634    |
----------------------------
itr #48 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.4914730191230774
Validation loss = 0.4985567629337311
Validation loss = 0.49700433015823364
Validation loss = 0.49475544691085815
Validation loss = 0.49557313323020935
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49315500259399414
Validation loss = 0.4866982102394104
Validation loss = 0.49371805787086487
Validation loss = 0.49456578493118286
Validation loss = 0.49566763639450073
Validation loss = 0.49849072098731995
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49332499504089355
Validation loss = 0.4987289011478424
Validation loss = 0.4945890009403229
Validation loss = 0.5047149658203125
Validation loss = 0.49868911504745483
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4865291118621826
Validation loss = 0.4885794520378113
Validation loss = 0.4895695745944977
Validation loss = 0.49091583490371704
Validation loss = 0.4928879737854004
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49100929498672485
Validation loss = 0.48874911665916443
Validation loss = 0.4965616762638092
Validation loss = 0.4942936897277832
Validation loss = 0.49786368012428284
Validation loss = 0.49564608931541443
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -94.5    |
| Iteration     | 48       |
| MaximumReturn | -0.125   |
| MinimumReturn | -171     |
| TotalSamples  | 83300    |
----------------------------
itr #49 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49720755219459534
Validation loss = 0.4943774938583374
Validation loss = 0.49662426114082336
Validation loss = 0.5034404397010803
Validation loss = 0.5071261525154114
Validation loss = 0.505895733833313
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.49398547410964966
Validation loss = 0.4936220943927765
Validation loss = 0.5028713345527649
Validation loss = 0.500940203666687
Validation loss = 0.5015338063240051
Validation loss = 0.5019400715827942
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49293461441993713
Validation loss = 0.5006017684936523
Validation loss = 0.5002124309539795
Validation loss = 0.5040509104728699
Validation loss = 0.5044562220573425
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49310827255249023
Validation loss = 0.48698174953460693
Validation loss = 0.4923636317253113
Validation loss = 0.49531182646751404
Validation loss = 0.49128779768943787
Validation loss = 0.49491575360298157
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.49976587295532227
Validation loss = 0.4944629669189453
Validation loss = 0.49455130100250244
Validation loss = 0.49769365787506104
Validation loss = 0.4979327619075775
Validation loss = 0.5000388622283936
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -28.9    |
| Iteration     | 49       |
| MaximumReturn | -0.187   |
| MinimumReturn | -117     |
| TotalSamples  | 84966    |
----------------------------
itr #50 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.49864786863327026
Validation loss = 0.4999385476112366
Validation loss = 0.49812644720077515
Validation loss = 0.5022591948509216
Validation loss = 0.5042648315429688
Validation loss = 0.5079807639122009
Validation loss = 0.5085012316703796
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4972462058067322
Validation loss = 0.49639832973480225
Validation loss = 0.5090692639350891
Validation loss = 0.49767544865608215
Validation loss = 0.5017725825309753
Validation loss = 0.5012803077697754
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.49638333916664124
Validation loss = 0.5014805793762207
Validation loss = 0.5004380345344543
Validation loss = 0.5050600171089172
Validation loss = 0.5045285224914551
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4934467077255249
Validation loss = 0.49911198019981384
Validation loss = 0.49743539094924927
Validation loss = 0.497695654630661
Validation loss = 0.4954162836074829
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5005005598068237
Validation loss = 0.4980645477771759
Validation loss = 0.4929177463054657
Validation loss = 0.49852967262268066
Validation loss = 0.5014126300811768
Validation loss = 0.5035127997398376
Validation loss = 0.5033103227615356
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -27      |
| Iteration     | 50       |
| MaximumReturn | -0.29    |
| MinimumReturn | -119     |
| TotalSamples  | 86632    |
----------------------------
itr #51 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5074098110198975
Validation loss = 0.5111294388771057
Validation loss = 0.5080372095108032
Validation loss = 0.509139358997345
Validation loss = 0.5067670941352844
Validation loss = 0.5226800441741943
Validation loss = 0.5105945467948914
Validation loss = 0.5151849985122681
Validation loss = 0.514578640460968
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5037094354629517
Validation loss = 0.49892836809158325
Validation loss = 0.5014735460281372
Validation loss = 0.503913402557373
Validation loss = 0.5056024193763733
Validation loss = 0.5054444670677185
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.505208432674408
Validation loss = 0.5082568526268005
Validation loss = 0.503721296787262
Validation loss = 0.5050896406173706
Validation loss = 0.5092229247093201
Validation loss = 0.5057486891746521
Validation loss = 0.5083852410316467
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.49556440114974976
Validation loss = 0.4989096522331238
Validation loss = 0.495082288980484
Validation loss = 0.49996647238731384
Validation loss = 0.49880120158195496
Validation loss = 0.5047500729560852
Validation loss = 0.5060041546821594
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5063862204551697
Validation loss = 0.5013797879219055
Validation loss = 0.5077824592590332
Validation loss = 0.5056704878807068
Validation loss = 0.5061789751052856
Validation loss = 0.5079189538955688
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -30      |
| Iteration     | 51       |
| MaximumReturn | -0.308   |
| MinimumReturn | -127     |
| TotalSamples  | 88298    |
----------------------------
itr #52 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5154757499694824
Validation loss = 0.5114073753356934
Validation loss = 0.5102964043617249
Validation loss = 0.5184457302093506
Validation loss = 0.519154965877533
Validation loss = 0.5182153582572937
Validation loss = 0.515519380569458
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5062956809997559
Validation loss = 0.5095366835594177
Validation loss = 0.5101257562637329
Validation loss = 0.5086902379989624
Validation loss = 0.5113534927368164
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.510297417640686
Validation loss = 0.514892041683197
Validation loss = 0.5113756656646729
Validation loss = 0.5075071454048157
Validation loss = 0.512313723564148
Validation loss = 0.5168753266334534
Validation loss = 0.514414370059967
Validation loss = 0.5174582004547119
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5097979307174683
Validation loss = 0.500734806060791
Validation loss = 0.49985358119010925
Validation loss = 0.5031154155731201
Validation loss = 0.5046547651290894
Validation loss = 0.5055086612701416
Validation loss = 0.5038713812828064
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5065425038337708
Validation loss = 0.5037632584571838
Validation loss = 0.5052957534790039
Validation loss = 0.5081939697265625
Validation loss = 0.5094842314720154
Validation loss = 0.5117163062095642
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -18      |
| Iteration     | 52       |
| MaximumReturn | -0.172   |
| MinimumReturn | -96.9    |
| TotalSamples  | 89964    |
----------------------------
itr #53 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5246546268463135
Validation loss = 0.5159757137298584
Validation loss = 0.519651472568512
Validation loss = 0.5267536640167236
Validation loss = 0.5231655240058899
Validation loss = 0.5190775394439697
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5053360462188721
Validation loss = 0.5085183382034302
Validation loss = 0.512045681476593
Validation loss = 0.5140478014945984
Validation loss = 0.514050304889679
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5173920392990112
Validation loss = 0.5182040929794312
Validation loss = 0.5240410566329956
Validation loss = 0.5215184688568115
Validation loss = 0.5187687277793884
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5104753971099854
Validation loss = 0.5032696723937988
Validation loss = 0.5141363739967346
Validation loss = 0.5085124373435974
Validation loss = 0.5103160738945007
Validation loss = 0.5136327743530273
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5170047879219055
Validation loss = 0.5036869049072266
Validation loss = 0.5125007033348083
Validation loss = 0.5102605819702148
Validation loss = 0.5153710842132568
Validation loss = 0.5170791745185852
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.84    |
| Iteration     | 53       |
| MaximumReturn | -0.143   |
| MinimumReturn | -60.7    |
| TotalSamples  | 91630    |
----------------------------
itr #54 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5219935178756714
Validation loss = 0.5178449749946594
Validation loss = 0.5168982744216919
Validation loss = 0.5211687088012695
Validation loss = 0.5226126313209534
Validation loss = 0.5227966904640198
Validation loss = 0.5235644578933716
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5103349089622498
Validation loss = 0.5170093178749084
Validation loss = 0.5098676681518555
Validation loss = 0.5147110223770142
Validation loss = 0.5148581862449646
Validation loss = 0.5076014995574951
Validation loss = 0.5176519751548767
Validation loss = 0.5234083533287048
Validation loss = 0.5172834396362305
Validation loss = 0.5208742618560791
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5240169167518616
Validation loss = 0.5167344212532043
Validation loss = 0.5211150050163269
Validation loss = 0.5224733352661133
Validation loss = 0.5281814336776733
Validation loss = 0.5218072533607483
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5099557042121887
Validation loss = 0.504362940788269
Validation loss = 0.507703423500061
Validation loss = 0.5115684270858765
Validation loss = 0.5181816816329956
Validation loss = 0.5199065208435059
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.512803316116333
Validation loss = 0.5161830186843872
Validation loss = 0.5214340686798096
Validation loss = 0.5233476161956787
Validation loss = 0.5170182585716248
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19      |
| Iteration     | 54       |
| MaximumReturn | -0.159   |
| MinimumReturn | -90.7    |
| TotalSamples  | 93296    |
----------------------------
itr #55 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5249564051628113
Validation loss = 0.5221357941627502
Validation loss = 0.5252036452293396
Validation loss = 0.5252032279968262
Validation loss = 0.5274221301078796
Validation loss = 0.5280398726463318
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5215514898300171
Validation loss = 0.5192640423774719
Validation loss = 0.5188376307487488
Validation loss = 0.5179637670516968
Validation loss = 0.5221123099327087
Validation loss = 0.5212085247039795
Validation loss = 0.5243505239486694
Validation loss = 0.5267069935798645
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5193295478820801
Validation loss = 0.5218088030815125
Validation loss = 0.5238884091377258
Validation loss = 0.5261362791061401
Validation loss = 0.5258093476295471
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5111342072486877
Validation loss = 0.5170384645462036
Validation loss = 0.5126223564147949
Validation loss = 0.5137984156608582
Validation loss = 0.5210388898849487
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5159454941749573
Validation loss = 0.5147141814231873
Validation loss = 0.5156177878379822
Validation loss = 0.5158612132072449
Validation loss = 0.5211309194564819
Validation loss = 0.5172479748725891
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -8.96    |
| Iteration     | 55       |
| MaximumReturn | -0.182   |
| MinimumReturn | -41.6    |
| TotalSamples  | 94962    |
----------------------------
itr #56 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5237886309623718
Validation loss = 0.5251253247261047
Validation loss = 0.5242864489555359
Validation loss = 0.5261719226837158
Validation loss = 0.5257689952850342
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5268014073371887
Validation loss = 0.5273857116699219
Validation loss = 0.5239070057868958
Validation loss = 0.5265167355537415
Validation loss = 0.5271416306495667
Validation loss = 0.5275586843490601
Validation loss = 0.5276788473129272
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5231013894081116
Validation loss = 0.5253452658653259
Validation loss = 0.5249496102333069
Validation loss = 0.5249388217926025
Validation loss = 0.5265108346939087
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5141514539718628
Validation loss = 0.5088881850242615
Validation loss = 0.5165436863899231
Validation loss = 0.5132290124893188
Validation loss = 0.5177302956581116
Validation loss = 0.5197088122367859
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5183771848678589
Validation loss = 0.5259880423545837
Validation loss = 0.518296480178833
Validation loss = 0.5236848592758179
Validation loss = 0.5263076424598694
Validation loss = 0.5309851765632629
Validation loss = 0.5248484015464783
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -4.34    |
| Iteration     | 56       |
| MaximumReturn | -0.196   |
| MinimumReturn | -46.2    |
| TotalSamples  | 96628    |
----------------------------
itr #57 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5232946872711182
Validation loss = 0.5265728831291199
Validation loss = 0.5270578861236572
Validation loss = 0.529015302658081
Validation loss = 0.5339609384536743
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5320425033569336
Validation loss = 0.5249719619750977
Validation loss = 0.5237080454826355
Validation loss = 0.520820140838623
Validation loss = 0.5271803736686707
Validation loss = 0.5261707901954651
Validation loss = 0.5304458737373352
Validation loss = 0.5327402949333191
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5235341191291809
Validation loss = 0.5263556838035583
Validation loss = 0.5226617455482483
Validation loss = 0.5238718390464783
Validation loss = 0.5278193950653076
Validation loss = 0.52754145860672
Validation loss = 0.5345342755317688
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5162699222564697
Validation loss = 0.5193023681640625
Validation loss = 0.5129851698875427
Validation loss = 0.5292391180992126
Validation loss = 0.517557680606842
Validation loss = 0.5250690579414368
Validation loss = 0.518484890460968
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5288386940956116
Validation loss = 0.521771252155304
Validation loss = 0.5206591486930847
Validation loss = 0.5255200862884521
Validation loss = 0.5267040729522705
Validation loss = 0.530148983001709
Validation loss = 0.53094482421875
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -34      |
| Iteration     | 57       |
| MaximumReturn | -0.123   |
| MinimumReturn | -93.1    |
| TotalSamples  | 98294    |
----------------------------
itr #58 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5275025367736816
Validation loss = 0.5243967175483704
Validation loss = 0.5252028703689575
Validation loss = 0.5305463671684265
Validation loss = 0.5317109823226929
Validation loss = 0.5317695736885071
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5271133184432983
Validation loss = 0.5325620174407959
Validation loss = 0.5247478485107422
Validation loss = 0.5347626209259033
Validation loss = 0.5303745865821838
Validation loss = 0.5328010320663452
Validation loss = 0.5321989059448242
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5291247963905334
Validation loss = 0.5294460654258728
Validation loss = 0.531522274017334
Validation loss = 0.5292294025421143
Validation loss = 0.5345161557197571
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5205893516540527
Validation loss = 0.5264283418655396
Validation loss = 0.5198668837547302
Validation loss = 0.5277430415153503
Validation loss = 0.5247453451156616
Validation loss = 0.526498019695282
Validation loss = 0.531427800655365
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.526984453201294
Validation loss = 0.525082528591156
Validation loss = 0.5305377840995789
Validation loss = 0.5304965376853943
Validation loss = 0.5300534963607788
Validation loss = 0.5337557196617126
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.449   |
| Iteration     | 58       |
| MaximumReturn | -0.176   |
| MinimumReturn | -1.06    |
| TotalSamples  | 99960    |
----------------------------
itr #59 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5286456346511841
Validation loss = 0.5272426009178162
Validation loss = 0.5321003198623657
Validation loss = 0.5332732200622559
Validation loss = 0.5322912335395813
Validation loss = 0.53569096326828
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5371168255805969
Validation loss = 0.5259069800376892
Validation loss = 0.5326880216598511
Validation loss = 0.5289022326469421
Validation loss = 0.5300716161727905
Validation loss = 0.5310767889022827
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5343046188354492
Validation loss = 0.5308240652084351
Validation loss = 0.5369729995727539
Validation loss = 0.535707950592041
Validation loss = 0.5310764908790588
Validation loss = 0.5346262454986572
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5258715152740479
Validation loss = 0.5251166820526123
Validation loss = 0.5259623527526855
Validation loss = 0.5291949510574341
Validation loss = 0.5283946394920349
Validation loss = 0.5339550971984863
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5319510102272034
Validation loss = 0.5320804119110107
Validation loss = 0.5331496596336365
Validation loss = 0.5304849743843079
Validation loss = 0.5348336696624756
Validation loss = 0.5327706336975098
Validation loss = 0.535641610622406
Validation loss = 0.5374518036842346
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.68    |
| Iteration     | 59       |
| MaximumReturn | -0.224   |
| MinimumReturn | -1.86    |
| TotalSamples  | 101626   |
----------------------------
itr #60 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5342963933944702
Validation loss = 0.5303688049316406
Validation loss = 0.5350644588470459
Validation loss = 0.5355781316757202
Validation loss = 0.5357776880264282
Validation loss = 0.5357251167297363
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5313130021095276
Validation loss = 0.5275437831878662
Validation loss = 0.5308801531791687
Validation loss = 0.5294380187988281
Validation loss = 0.53312087059021
Validation loss = 0.5329443216323853
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5367675423622131
Validation loss = 0.5347382426261902
Validation loss = 0.534131646156311
Validation loss = 0.541962742805481
Validation loss = 0.5335207581520081
Validation loss = 0.5387784838676453
Validation loss = 0.5359712839126587
Validation loss = 0.5376013517379761
Validation loss = 0.5385141968727112
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5274412035942078
Validation loss = 0.5264508128166199
Validation loss = 0.5303585529327393
Validation loss = 0.5298944711685181
Validation loss = 0.5295230150222778
Validation loss = 0.531378448009491
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5327742099761963
Validation loss = 0.5311958193778992
Validation loss = 0.5329428315162659
Validation loss = 0.533819854259491
Validation loss = 0.5335130095481873
Validation loss = 0.5367128252983093
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.256   |
| Iteration     | 60       |
| MaximumReturn | -0.108   |
| MinimumReturn | -1.15    |
| TotalSamples  | 103292   |
----------------------------
itr #61 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5359076261520386
Validation loss = 0.5330948233604431
Validation loss = 0.5327448844909668
Validation loss = 0.5361343622207642
Validation loss = 0.5391097664833069
Validation loss = 0.5329921841621399
Validation loss = 0.5352010130882263
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5321663022041321
Validation loss = 0.5392673015594482
Validation loss = 0.5325682163238525
Validation loss = 0.5383643507957458
Validation loss = 0.5375475287437439
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5394558906555176
Validation loss = 0.5382949113845825
Validation loss = 0.5405335426330566
Validation loss = 0.5456045269966125
Validation loss = 0.542996346950531
Validation loss = 0.5459341406822205
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5312891006469727
Validation loss = 0.525692343711853
Validation loss = 0.528331995010376
Validation loss = 0.5354043841362
Validation loss = 0.5278401970863342
Validation loss = 0.5305598974227905
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5368502140045166
Validation loss = 0.5374595522880554
Validation loss = 0.5337148308753967
Validation loss = 0.5348970890045166
Validation loss = 0.5413659811019897
Validation loss = 0.5366126298904419
Validation loss = 0.5357575416564941
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.294   |
| Iteration     | 61       |
| MaximumReturn | -0.116   |
| MinimumReturn | -0.71    |
| TotalSamples  | 104958   |
----------------------------
itr #62 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5368536710739136
Validation loss = 0.5398778319358826
Validation loss = 0.5364256501197815
Validation loss = 0.5401180982589722
Validation loss = 0.5346634984016418
Validation loss = 0.5384501814842224
Validation loss = 0.5400097370147705
Validation loss = 0.5415137410163879
Validation loss = 0.539592981338501
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5352620482444763
Validation loss = 0.5276766419410706
Validation loss = 0.5302251577377319
Validation loss = 0.5377528071403503
Validation loss = 0.5328788757324219
Validation loss = 0.5365060567855835
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5434021353721619
Validation loss = 0.5367392301559448
Validation loss = 0.5401297807693481
Validation loss = 0.5439423322677612
Validation loss = 0.5409289598464966
Validation loss = 0.5515826940536499
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5297321677207947
Validation loss = 0.5325137376785278
Validation loss = 0.5277426242828369
Validation loss = 0.5328129529953003
Validation loss = 0.5350275039672852
Validation loss = 0.5346592664718628
Validation loss = 0.535946249961853
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.539763867855072
Validation loss = 0.5394315123558044
Validation loss = 0.5376182794570923
Validation loss = 0.5342930555343628
Validation loss = 0.5414801239967346
Validation loss = 0.5437665581703186
Validation loss = 0.5430406928062439
Validation loss = 0.5397449135780334
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -1.13    |
| Iteration     | 62       |
| MaximumReturn | -0.0583  |
| MinimumReturn | -20      |
| TotalSamples  | 106624   |
----------------------------
itr #63 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.541143536567688
Validation loss = 0.5360828042030334
Validation loss = 0.5433443784713745
Validation loss = 0.5420435667037964
Validation loss = 0.5448158383369446
Validation loss = 0.5412699580192566
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5367636680603027
Validation loss = 0.536081075668335
Validation loss = 0.5370997190475464
Validation loss = 0.5383030772209167
Validation loss = 0.5348859429359436
Validation loss = 0.5403670072555542
Validation loss = 0.5369067192077637
Validation loss = 0.542201578617096
Validation loss = 0.53972989320755
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.540012776851654
Validation loss = 0.5441035628318787
Validation loss = 0.5429173111915588
Validation loss = 0.5449949502944946
Validation loss = 0.5400926470756531
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5324628949165344
Validation loss = 0.5325292348861694
Validation loss = 0.5371546745300293
Validation loss = 0.528853714466095
Validation loss = 0.5388535261154175
Validation loss = 0.5351361036300659
Validation loss = 0.5346497893333435
Validation loss = 0.535131573677063
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5461484789848328
Validation loss = 0.5365741848945618
Validation loss = 0.5363951921463013
Validation loss = 0.5392976999282837
Validation loss = 0.5454130172729492
Validation loss = 0.5432181358337402
Validation loss = 0.5477607250213623
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -20.1    |
| Iteration     | 63       |
| MaximumReturn | -0.0866  |
| MinimumReturn | -58.2    |
| TotalSamples  | 108290   |
----------------------------
itr #64 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5407784581184387
Validation loss = 0.5370432734489441
Validation loss = 0.5353737473487854
Validation loss = 0.5420193672180176
Validation loss = 0.5427145957946777
Validation loss = 0.5430791974067688
Validation loss = 0.5425230860710144
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5358431935310364
Validation loss = 0.5375678539276123
Validation loss = 0.5389956831932068
Validation loss = 0.5376893877983093
Validation loss = 0.5398104190826416
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5445917844772339
Validation loss = 0.5406006574630737
Validation loss = 0.5422712564468384
Validation loss = 0.5413398146629333
Validation loss = 0.5439581274986267
Validation loss = 0.5424378514289856
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5327362418174744
Validation loss = 0.5317899584770203
Validation loss = 0.5495734214782715
Validation loss = 0.5361171960830688
Validation loss = 0.5381327867507935
Validation loss = 0.5387909412384033
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5462146401405334
Validation loss = 0.5421463251113892
Validation loss = 0.5380581617355347
Validation loss = 0.5425686836242676
Validation loss = 0.543645441532135
Validation loss = 0.5418373346328735
Validation loss = 0.5429192781448364
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -24.9    |
| Iteration     | 64       |
| MaximumReturn | -0.137   |
| MinimumReturn | -74.4    |
| TotalSamples  | 109956   |
----------------------------
itr #65 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5417930483818054
Validation loss = 0.5446471571922302
Validation loss = 0.5389148592948914
Validation loss = 0.5440266728401184
Validation loss = 0.542724609375
Validation loss = 0.5457287430763245
Validation loss = 0.5485944747924805
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5379605293273926
Validation loss = 0.5328319668769836
Validation loss = 0.5395718216896057
Validation loss = 0.5415879487991333
Validation loss = 0.5391040444374084
Validation loss = 0.5387874245643616
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5430260300636292
Validation loss = 0.5356921553611755
Validation loss = 0.5400751233100891
Validation loss = 0.5393093228340149
Validation loss = 0.5467691421508789
Validation loss = 0.5455739498138428
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.534493088722229
Validation loss = 0.5336126089096069
Validation loss = 0.5361636281013489
Validation loss = 0.5347974896430969
Validation loss = 0.5369691252708435
Validation loss = 0.5382540225982666
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5405964255332947
Validation loss = 0.5470827221870422
Validation loss = 0.5358409285545349
Validation loss = 0.5427471995353699
Validation loss = 0.543861448764801
Validation loss = 0.5410796403884888
Validation loss = 0.5458975434303284
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.01    |
| Iteration     | 65       |
| MaximumReturn | -0.151   |
| MinimumReturn | -31.3    |
| TotalSamples  | 111622   |
----------------------------
itr #66 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5406672358512878
Validation loss = 0.5428065061569214
Validation loss = 0.5453971028327942
Validation loss = 0.5435647368431091
Validation loss = 0.5409186482429504
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5383270382881165
Validation loss = 0.5415874123573303
Validation loss = 0.5371246933937073
Validation loss = 0.5380000472068787
Validation loss = 0.5395318269729614
Validation loss = 0.5432360768318176
Validation loss = 0.5405554175376892
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5411052703857422
Validation loss = 0.5396273732185364
Validation loss = 0.5408881902694702
Validation loss = 0.5413922667503357
Validation loss = 0.5490675568580627
Validation loss = 0.5479649901390076
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5337330102920532
Validation loss = 0.532153308391571
Validation loss = 0.5331976413726807
Validation loss = 0.538991391658783
Validation loss = 0.5372009873390198
Validation loss = 0.5351609587669373
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5471164584159851
Validation loss = 0.5424659252166748
Validation loss = 0.5374757051467896
Validation loss = 0.5442445278167725
Validation loss = 0.5442699193954468
Validation loss = 0.5417471528053284
Validation loss = 0.5465570092201233
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.23    |
| Iteration     | 66       |
| MaximumReturn | -0.223   |
| MinimumReturn | -61.1    |
| TotalSamples  | 113288   |
----------------------------
itr #67 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5457583069801331
Validation loss = 0.5442262291908264
Validation loss = 0.5395693778991699
Validation loss = 0.5423581600189209
Validation loss = 0.5441697239875793
Validation loss = 0.5398563742637634
Validation loss = 0.5492742657661438
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5426393747329712
Validation loss = 0.5377579927444458
Validation loss = 0.5391131043434143
Validation loss = 0.5428956747055054
Validation loss = 0.5409311056137085
Validation loss = 0.542136013507843
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5421627163887024
Validation loss = 0.539003312587738
Validation loss = 0.5460158586502075
Validation loss = 0.5446907877922058
Validation loss = 0.5449591875076294
Validation loss = 0.5485889315605164
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5374506711959839
Validation loss = 0.5330964922904968
Validation loss = 0.5328949689865112
Validation loss = 0.5379613041877747
Validation loss = 0.5386984944343567
Validation loss = 0.5371006727218628
Validation loss = 0.5352633595466614
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5441561937332153
Validation loss = 0.5404874682426453
Validation loss = 0.5416836142539978
Validation loss = 0.5382367968559265
Validation loss = 0.5428981781005859
Validation loss = 0.551053524017334
Validation loss = 0.5501974821090698
Validation loss = 0.5438104867935181
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.351   |
| Iteration     | 67       |
| MaximumReturn | -0.178   |
| MinimumReturn | -0.571   |
| TotalSamples  | 114954   |
----------------------------
itr #68 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5431376695632935
Validation loss = 0.5460466146469116
Validation loss = 0.5411239862442017
Validation loss = 0.5452432632446289
Validation loss = 0.5469619631767273
Validation loss = 0.5446096062660217
Validation loss = 0.5491554141044617
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5386984944343567
Validation loss = 0.5336049199104309
Validation loss = 0.5429896116256714
Validation loss = 0.5369046330451965
Validation loss = 0.5381309390068054
Validation loss = 0.5360422730445862
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5410201549530029
Validation loss = 0.5452077984809875
Validation loss = 0.5431627035140991
Validation loss = 0.5427576303482056
Validation loss = 0.5413151979446411
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5353163480758667
Validation loss = 0.5431974530220032
Validation loss = 0.5346928834915161
Validation loss = 0.5441117286682129
Validation loss = 0.5376873016357422
Validation loss = 0.5417956113815308
Validation loss = 0.5352944731712341
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5436407923698425
Validation loss = 0.543235719203949
Validation loss = 0.5450370907783508
Validation loss = 0.5545652508735657
Validation loss = 0.5425996780395508
Validation loss = 0.5450760126113892
Validation loss = 0.5484049320220947
Validation loss = 0.5535039901733398
Validation loss = 0.5461716055870056
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -20.9    |
| Iteration     | 68       |
| MaximumReturn | -0.0846  |
| MinimumReturn | -76.2    |
| TotalSamples  | 116620   |
----------------------------
itr #69 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5414201021194458
Validation loss = 0.5390766263008118
Validation loss = 0.5443006157875061
Validation loss = 0.5432700514793396
Validation loss = 0.542725682258606
Validation loss = 0.542384922504425
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5358974933624268
Validation loss = 0.5363256931304932
Validation loss = 0.5350462794303894
Validation loss = 0.537172257900238
Validation loss = 0.539339542388916
Validation loss = 0.5378683805465698
Validation loss = 0.5413450598716736
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.538547694683075
Validation loss = 0.5355914831161499
Validation loss = 0.5408117771148682
Validation loss = 0.5389512181282043
Validation loss = 0.5441712141036987
Validation loss = 0.5407853126525879
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5358983874320984
Validation loss = 0.5343540906906128
Validation loss = 0.539417028427124
Validation loss = 0.538491427898407
Validation loss = 0.539319634437561
Validation loss = 0.540488600730896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5465699434280396
Validation loss = 0.5452651977539062
Validation loss = 0.5482977628707886
Validation loss = 0.5471664071083069
Validation loss = 0.5476845502853394
Validation loss = 0.5443192720413208
Validation loss = 0.5495296716690063
Validation loss = 0.5482065081596375
Validation loss = 0.5475930571556091
Validation loss = 0.5452083349227905
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -50      |
| Iteration     | 69       |
| MaximumReturn | -0.209   |
| MinimumReturn | -149     |
| TotalSamples  | 118286   |
----------------------------
itr #70 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5399945974349976
Validation loss = 0.538871705532074
Validation loss = 0.5404346585273743
Validation loss = 0.5443083047866821
Validation loss = 0.544654905796051
Validation loss = 0.5411062836647034
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5395028591156006
Validation loss = 0.5386691689491272
Validation loss = 0.5370338559150696
Validation loss = 0.5412713885307312
Validation loss = 0.5373650193214417
Validation loss = 0.5419535040855408
Validation loss = 0.5418772101402283
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5471101403236389
Validation loss = 0.5365543365478516
Validation loss = 0.5447967648506165
Validation loss = 0.5422711372375488
Validation loss = 0.5451545119285583
Validation loss = 0.5444253087043762
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5347777009010315
Validation loss = 0.5343790054321289
Validation loss = 0.5341374278068542
Validation loss = 0.5428460240364075
Validation loss = 0.5409144759178162
Validation loss = 0.539784848690033
Validation loss = 0.5383989810943604
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5438397526741028
Validation loss = 0.5430986881256104
Validation loss = 0.5503450036048889
Validation loss = 0.5463984608650208
Validation loss = 0.5481647849082947
Validation loss = 0.5501129031181335
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -56.2    |
| Iteration     | 70       |
| MaximumReturn | -0.122   |
| MinimumReturn | -127     |
| TotalSamples  | 119952   |
----------------------------
itr #71 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5410035848617554
Validation loss = 0.5384666919708252
Validation loss = 0.5419653058052063
Validation loss = 0.5427882671356201
Validation loss = 0.5423916578292847
Validation loss = 0.5425513982772827
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5389716625213623
Validation loss = 0.5363165736198425
Validation loss = 0.540553867816925
Validation loss = 0.5368047952651978
Validation loss = 0.5391669869422913
Validation loss = 0.5376898050308228
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5371348857879639
Validation loss = 0.5386951565742493
Validation loss = 0.5385512113571167
Validation loss = 0.5382391810417175
Validation loss = 0.5387106537818909
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5315123796463013
Validation loss = 0.5422784090042114
Validation loss = 0.5360531210899353
Validation loss = 0.5368815064430237
Validation loss = 0.5388756394386292
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5420514345169067
Validation loss = 0.5439300537109375
Validation loss = 0.5444186925888062
Validation loss = 0.5449309945106506
Validation loss = 0.5453110933303833
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.429   |
| Iteration     | 71       |
| MaximumReturn | -0.249   |
| MinimumReturn | -0.756   |
| TotalSamples  | 121618   |
----------------------------
itr #72 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5454930663108826
Validation loss = 0.5434991717338562
Validation loss = 0.5382798314094543
Validation loss = 0.5427457094192505
Validation loss = 0.5404031872749329
Validation loss = 0.5438177585601807
Validation loss = 0.5419400930404663
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5389530658721924
Validation loss = 0.5398375391960144
Validation loss = 0.536064863204956
Validation loss = 0.5440018177032471
Validation loss = 0.5423198342323303
Validation loss = 0.5444516539573669
Validation loss = 0.542499303817749
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5379082560539246
Validation loss = 0.5373380780220032
Validation loss = 0.5401886701583862
Validation loss = 0.540842592716217
Validation loss = 0.5411062836647034
Validation loss = 0.5424630641937256
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5394248366355896
Validation loss = 0.5308618545532227
Validation loss = 0.5335683226585388
Validation loss = 0.5383169054985046
Validation loss = 0.5373207926750183
Validation loss = 0.5354757905006409
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5424556732177734
Validation loss = 0.5459506511688232
Validation loss = 0.5448351502418518
Validation loss = 0.5440987944602966
Validation loss = 0.5455228686332703
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.511   |
| Iteration     | 72       |
| MaximumReturn | -0.212   |
| MinimumReturn | -0.899   |
| TotalSamples  | 123284   |
----------------------------
itr #73 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5443006157875061
Validation loss = 0.5407260656356812
Validation loss = 0.5403157472610474
Validation loss = 0.5420364737510681
Validation loss = 0.5418121218681335
Validation loss = 0.5459920763969421
Validation loss = 0.5470238924026489
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5421339273452759
Validation loss = 0.5373839735984802
Validation loss = 0.5412979125976562
Validation loss = 0.5387926697731018
Validation loss = 0.5406317710876465
Validation loss = 0.5409740805625916
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5392763018608093
Validation loss = 0.5475056767463684
Validation loss = 0.5406548380851746
Validation loss = 0.5417306423187256
Validation loss = 0.5444307327270508
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5407126545906067
Validation loss = 0.5366814136505127
Validation loss = 0.5346803665161133
Validation loss = 0.5342581272125244
Validation loss = 0.5366476774215698
Validation loss = 0.5377978682518005
Validation loss = 0.5420845150947571
Validation loss = 0.5407795310020447
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5435946583747864
Validation loss = 0.5470381379127502
Validation loss = 0.5471649169921875
Validation loss = 0.5426495671272278
Validation loss = 0.542865514755249
Validation loss = 0.5543580651283264
Validation loss = 0.5446721315383911
Validation loss = 0.5472435355186462
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.13    |
| Iteration     | 73       |
| MaximumReturn | -0.213   |
| MinimumReturn | -67.1    |
| TotalSamples  | 124950   |
----------------------------
itr #74 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5440016388893127
Validation loss = 0.5409208536148071
Validation loss = 0.5396942496299744
Validation loss = 0.5431710481643677
Validation loss = 0.5418705344200134
Validation loss = 0.5507708191871643
Validation loss = 0.5457651615142822
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5390335917472839
Validation loss = 0.5379433631896973
Validation loss = 0.5395888090133667
Validation loss = 0.5411714911460876
Validation loss = 0.5395207405090332
Validation loss = 0.5433309078216553
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5426185131072998
Validation loss = 0.5380135774612427
Validation loss = 0.541409969329834
Validation loss = 0.5458720922470093
Validation loss = 0.5396969318389893
Validation loss = 0.5425202250480652
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5358026027679443
Validation loss = 0.537773072719574
Validation loss = 0.5393720269203186
Validation loss = 0.5415013432502747
Validation loss = 0.5388144254684448
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5431907773017883
Validation loss = 0.5443695783615112
Validation loss = 0.5446092486381531
Validation loss = 0.5467870235443115
Validation loss = 0.5509473085403442
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -0.533   |
| Iteration     | 74       |
| MaximumReturn | -0.199   |
| MinimumReturn | -0.976   |
| TotalSamples  | 126616   |
----------------------------
itr #75 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5468661189079285
Validation loss = 0.5413124561309814
Validation loss = 0.5420125126838684
Validation loss = 0.5544981360435486
Validation loss = 0.5477584600448608
Validation loss = 0.5464690327644348
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5426589250564575
Validation loss = 0.5372520089149475
Validation loss = 0.5404209494590759
Validation loss = 0.550068199634552
Validation loss = 0.5398810505867004
Validation loss = 0.5428395867347717
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.541966438293457
Validation loss = 0.5427958369255066
Validation loss = 0.5384606122970581
Validation loss = 0.5411059856414795
Validation loss = 0.5407372117042542
Validation loss = 0.5455089211463928
Validation loss = 0.5447282195091248
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5426144003868103
Validation loss = 0.5379939675331116
Validation loss = 0.5373873114585876
Validation loss = 0.5371397137641907
Validation loss = 0.5393282175064087
Validation loss = 0.5395050644874573
Validation loss = 0.5418102145195007
Validation loss = 0.5431594252586365
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5459353923797607
Validation loss = 0.5464068055152893
Validation loss = 0.5441126823425293
Validation loss = 0.5505781769752502
Validation loss = 0.548120379447937
Validation loss = 0.5536496043205261
Validation loss = 0.5476424098014832
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -37      |
| Iteration     | 75       |
| MaximumReturn | -0.206   |
| MinimumReturn | -106     |
| TotalSamples  | 128282   |
----------------------------
itr #76 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5378904342651367
Validation loss = 0.5408240556716919
Validation loss = 0.5446151494979858
Validation loss = 0.544272780418396
Validation loss = 0.5447852611541748
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5358806848526001
Validation loss = 0.5442119836807251
Validation loss = 0.5406040549278259
Validation loss = 0.539732038974762
Validation loss = 0.5433622598648071
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5447587370872498
Validation loss = 0.539905846118927
Validation loss = 0.5450196266174316
Validation loss = 0.5431063771247864
Validation loss = 0.5459959506988525
Validation loss = 0.5521871447563171
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5348063707351685
Validation loss = 0.5352208614349365
Validation loss = 0.536771297454834
Validation loss = 0.540543794631958
Validation loss = 0.5439401865005493
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.544927716255188
Validation loss = 0.5423136949539185
Validation loss = 0.5562704801559448
Validation loss = 0.5430680513381958
Validation loss = 0.5490090250968933
Validation loss = 0.5496595501899719
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -5.55    |
| Iteration     | 76       |
| MaximumReturn | -0.155   |
| MinimumReturn | -71      |
| TotalSamples  | 129948   |
----------------------------
itr #77 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5420095920562744
Validation loss = 0.5400335788726807
Validation loss = 0.5424469709396362
Validation loss = 0.541182816028595
Validation loss = 0.5407205820083618
Validation loss = 0.5456297993659973
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5386245250701904
Validation loss = 0.5355567932128906
Validation loss = 0.5367587208747864
Validation loss = 0.5368792414665222
Validation loss = 0.5456840395927429
Validation loss = 0.541436493396759
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5511536002159119
Validation loss = 0.5403361916542053
Validation loss = 0.5467162728309631
Validation loss = 0.5445224642753601
Validation loss = 0.5454017519950867
Validation loss = 0.5465430617332458
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5363129377365112
Validation loss = 0.5372663140296936
Validation loss = 0.5341281890869141
Validation loss = 0.539203941822052
Validation loss = 0.5353449583053589
Validation loss = 0.5421392917633057
Validation loss = 0.5392432808876038
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5474210381507874
Validation loss = 0.5436564683914185
Validation loss = 0.5438593029975891
Validation loss = 0.5490672588348389
Validation loss = 0.5491083264350891
Validation loss = 0.5456373691558838
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -57.7    |
| Iteration     | 77       |
| MaximumReturn | -0.234   |
| MinimumReturn | -123     |
| TotalSamples  | 131614   |
----------------------------
itr #78 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.539323627948761
Validation loss = 0.5396717190742493
Validation loss = 0.539959728717804
Validation loss = 0.5420411825180054
Validation loss = 0.5434060096740723
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5391778349876404
Validation loss = 0.5380837321281433
Validation loss = 0.5381538271903992
Validation loss = 0.5371112823486328
Validation loss = 0.5418214797973633
Validation loss = 0.5443962216377258
Validation loss = 0.5444266200065613
Validation loss = 0.5446189641952515
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5432542562484741
Validation loss = 0.5420488715171814
Validation loss = 0.5434577465057373
Validation loss = 0.5464600324630737
Validation loss = 0.5428774952888489
Validation loss = 0.5489435195922852
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5341554284095764
Validation loss = 0.5399751663208008
Validation loss = 0.5368294715881348
Validation loss = 0.5374069213867188
Validation loss = 0.5401288270950317
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5428964495658875
Validation loss = 0.5429654717445374
Validation loss = 0.545589804649353
Validation loss = 0.5431516766548157
Validation loss = 0.5468721985816956
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -45.7    |
| Iteration     | 78       |
| MaximumReturn | -0.151   |
| MinimumReturn | -98.5    |
| TotalSamples  | 133280   |
----------------------------
itr #79 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5408735871315002
Validation loss = 0.5361275672912598
Validation loss = 0.5436761975288391
Validation loss = 0.544524073600769
Validation loss = 0.537264347076416
Validation loss = 0.5420317649841309
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5376273989677429
Validation loss = 0.5364928841590881
Validation loss = 0.5377405881881714
Validation loss = 0.536656379699707
Validation loss = 0.5382941961288452
Validation loss = 0.5409464836120605
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5368092656135559
Validation loss = 0.5407041907310486
Validation loss = 0.5437477231025696
Validation loss = 0.5443220138549805
Validation loss = 0.5436336398124695
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5338463187217712
Validation loss = 0.5332158803939819
Validation loss = 0.5370965600013733
Validation loss = 0.5396777987480164
Validation loss = 0.5320942997932434
Validation loss = 0.5371379852294922
Validation loss = 0.5385218262672424
Validation loss = 0.5374341011047363
Validation loss = 0.5417876243591309
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5372007489204407
Validation loss = 0.5356355905532837
Validation loss = 0.537226676940918
Validation loss = 0.5416752696037292
Validation loss = 0.540297269821167
Validation loss = 0.5418946146965027
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 100.
Path 2 | total_timesteps 200.
Path 3 | total_timesteps 300.
Path 4 | total_timesteps 400.
Path 5 | total_timesteps 500.
Path 6 | total_timesteps 600.
Path 7 | total_timesteps 700.
Path 8 | total_timesteps 800.
Path 9 | total_timesteps 900.
Path 10 | total_timesteps 1000.
Path 11 | total_timesteps 1100.
Path 12 | total_timesteps 1200.
Path 13 | total_timesteps 1300.
Path 14 | total_timesteps 1400.
Path 15 | total_timesteps 1500.
Path 16 | total_timesteps 1600.
Path 17 | total_timesteps 1700.
Path 18 | total_timesteps 1800.
Path 19 | total_timesteps 1900.
Path 20 | total_timesteps 2000.
Path 21 | total_timesteps 2100.
Path 22 | total_timesteps 2200.
Path 23 | total_timesteps 2300.
Path 24 | total_timesteps 2400.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -3.41    |
| Iteration     | 79       |
| MaximumReturn | -0.139   |
| MinimumReturn | -47      |
| TotalSamples  | 134946   |
----------------------------
