Logging to experiments/half_cheetah/control-affine/halfcheetah_seed2314
Print configuration .....
{'env_name': 'half_cheetah', 'random_seeds': [4321, 2314, 2341, 3421], 'save_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'intrinsic_reward_only': False, 'external_reward_evaluation_interval': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [32, 32], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 40, 'batch_size': 50000, 'gae': 0.95}, 'trpo_ext_reward': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5780965089797974
Validation loss = 0.12775377929210663
Validation loss = 0.08829058706760406
Validation loss = 0.07672878354787827
Validation loss = 0.07126685976982117
Validation loss = 0.06929771602153778
Validation loss = 0.0655287653207779
Validation loss = 0.06537770479917526
Validation loss = 0.06332466751337051
Validation loss = 0.06264585256576538
Validation loss = 0.062315501272678375
Validation loss = 0.06103657931089401
Validation loss = 0.061686329543590546
Validation loss = 0.06152835860848427
Validation loss = 0.058496832847595215
Validation loss = 0.05721311271190643
Validation loss = 0.05876379460096359
Validation loss = 0.05759523808956146
Validation loss = 0.05680142343044281
Validation loss = 0.05647225305438042
Validation loss = 0.05563607066869736
Validation loss = 0.05531454086303711
Validation loss = 0.05663147568702698
Validation loss = 0.0555185005068779
Validation loss = 0.060446709394454956
Validation loss = 0.053842321038246155
Validation loss = 0.05611315369606018
Validation loss = 0.0541161447763443
Validation loss = 0.05568324774503708
Validation loss = 0.05294695496559143
Validation loss = 0.0598384365439415
Validation loss = 0.05319904536008835
Validation loss = 0.05808620899915695
Validation loss = 0.05313994735479355
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5409799814224243
Validation loss = 0.12216316163539886
Validation loss = 0.08793865144252777
Validation loss = 0.07477022707462311
Validation loss = 0.07230597734451294
Validation loss = 0.06977677345275879
Validation loss = 0.06693431735038757
Validation loss = 0.06792884320020676
Validation loss = 0.06373606622219086
Validation loss = 0.06328494846820831
Validation loss = 0.06352700293064117
Validation loss = 0.06150270998477936
Validation loss = 0.060577236115932465
Validation loss = 0.06155622377991676
Validation loss = 0.06114021688699722
Validation loss = 0.06083814054727554
Validation loss = 0.058387503027915955
Validation loss = 0.06158608943223953
Validation loss = 0.05718318000435829
Validation loss = 0.05912517011165619
Validation loss = 0.05990568920969963
Validation loss = 0.057524241507053375
Validation loss = 0.05903492495417595
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.5384994745254517
Validation loss = 0.122222900390625
Validation loss = 0.08769301325082779
Validation loss = 0.07698703557252884
Validation loss = 0.07134239375591278
Validation loss = 0.0715596005320549
Validation loss = 0.06643438339233398
Validation loss = 0.06701305508613586
Validation loss = 0.06351765990257263
Validation loss = 0.06497432291507721
Validation loss = 0.06647871434688568
Validation loss = 0.06164116412401199
Validation loss = 0.06224537640810013
Validation loss = 0.059253133833408356
Validation loss = 0.05872844532132149
Validation loss = 0.06118883565068245
Validation loss = 0.0594407320022583
Validation loss = 0.05725645273923874
Validation loss = 0.056489747017621994
Validation loss = 0.059337034821510315
Validation loss = 0.05597831308841705
Validation loss = 0.05833936855196953
Validation loss = 0.05534497648477554
Validation loss = 0.05721493810415268
Validation loss = 0.05494692549109459
Validation loss = 0.05636138468980789
Validation loss = 0.05573283135890961
Validation loss = 0.05393606424331665
Validation loss = 0.0555419996380806
Validation loss = 0.05343408137559891
Validation loss = 0.0541350394487381
Validation loss = 0.05911935865879059
Validation loss = 0.05632128193974495
Validation loss = 0.05280375853180885
Validation loss = 0.05517840385437012
Validation loss = 0.05475611984729767
Validation loss = 0.05424177646636963
Validation loss = 0.055631041526794434
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.5775966644287109
Validation loss = 0.12824049592018127
Validation loss = 0.08966448903083801
Validation loss = 0.07584821432828903
Validation loss = 0.0728425681591034
Validation loss = 0.06984274089336395
Validation loss = 0.06609340012073517
Validation loss = 0.06411497294902802
Validation loss = 0.06259956955909729
Validation loss = 0.06604930758476257
Validation loss = 0.060222920030355453
Validation loss = 0.0619315505027771
Validation loss = 0.05863596498966217
Validation loss = 0.06026071310043335
Validation loss = 0.057967595756053925
Validation loss = 0.0614863820374012
Validation loss = 0.05949156731367111
Validation loss = 0.05785873904824257
Validation loss = 0.05932221561670303
Validation loss = 0.05475998669862747
Validation loss = 0.06003222614526749
Validation loss = 0.055087506771087646
Validation loss = 0.053905561566352844
Validation loss = 0.05575522035360336
Validation loss = 0.053174640983343124
Validation loss = 0.055811554193496704
Validation loss = 0.05690816044807434
Validation loss = 0.05323095619678497
Validation loss = 0.06989562511444092
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5749078989028931
Validation loss = 0.12583374977111816
Validation loss = 0.0900130495429039
Validation loss = 0.07848760485649109
Validation loss = 0.07379468530416489
Validation loss = 0.0712052509188652
Validation loss = 0.06793045997619629
Validation loss = 0.07284074276685715
Validation loss = 0.06472352147102356
Validation loss = 0.06288614124059677
Validation loss = 0.0653039962053299
Validation loss = 0.06326412409543991
Validation loss = 0.06381058692932129
Validation loss = 0.06148167699575424
Validation loss = 0.06129639595746994
Validation loss = 0.06609247624874115
Validation loss = 0.06092599779367447
Validation loss = 0.059084922075271606
Validation loss = 0.058441080152988434
Validation loss = 0.05878891050815582
Validation loss = 0.06080331653356552
Validation loss = 0.0581633523106575
Validation loss = 0.06523029506206512
Validation loss = 0.05895703285932541
Validation loss = 0.05653310939669609
Validation loss = 0.057466860860586166
Validation loss = 0.05631675943732262
Validation loss = 0.058100588619709015
Validation loss = 0.0581413209438324
Validation loss = 0.056964561343193054
Validation loss = 0.05878346413373947
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -357     |
| Iteration     | 0        |
| MaximumReturn | -313     |
| MinimumReturn | -436     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 1.300116777420044
Validation loss = 1.9640015363693237
Validation loss = 2.1111721992492676
Validation loss = 2.1497087478637695
Validation loss = 2.1205008029937744
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 1.1861357688903809
Validation loss = 1.8351320028305054
Validation loss = 2.010499954223633
Validation loss = 2.0168893337249756
Validation loss = 2.077662706375122
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 1.046658992767334
Validation loss = 1.6581107378005981
Validation loss = 1.7560856342315674
Validation loss = 1.8397520780563354
Validation loss = 1.6922307014465332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 1.116306185722351
Validation loss = 1.7262753248214722
Validation loss = 1.8446755409240723
Validation loss = 1.9502736330032349
Validation loss = 1.9080874919891357
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 1.0537242889404297
Validation loss = 1.6130324602127075
Validation loss = 1.7246065139770508
Validation loss = 1.8293706178665161
Validation loss = 1.770418643951416
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -351     |
| Iteration     | 1        |
| MaximumReturn | -297     |
| MinimumReturn | -398     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 1.1789156198501587
Validation loss = 1.4085640907287598
Validation loss = 1.3687820434570312
Validation loss = 1.4043241739273071
Validation loss = 1.4128532409667969
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 1.17156982421875
Validation loss = 1.3041400909423828
Validation loss = 1.248557686805725
Validation loss = 1.3372631072998047
Validation loss = 1.3441799879074097
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 1.0718662738800049
Validation loss = 1.1352146863937378
Validation loss = 1.144485354423523
Validation loss = 1.1589528322219849
Validation loss = 1.2694364786148071
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 1.0137286186218262
Validation loss = 1.2637872695922852
Validation loss = 1.2933217287063599
Validation loss = 1.3409171104431152
Validation loss = 1.2934318780899048
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 1.0280565023422241
Validation loss = 1.1709877252578735
Validation loss = 1.1399223804473877
Validation loss = 1.2095156908035278
Validation loss = 1.2099714279174805
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -346     |
| Iteration     | 2        |
| MaximumReturn | -279     |
| MinimumReturn | -457     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 1.828751564025879
Validation loss = 2.07401442527771
Validation loss = 1.9826252460479736
Validation loss = 2.0302188396453857
Validation loss = 2.0954506397247314
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 1.7086141109466553
Validation loss = 1.8397393226623535
Validation loss = 1.981912612915039
Validation loss = 2.03891921043396
Validation loss = 1.9738311767578125
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 1.5221811532974243
Validation loss = 1.6955205202102661
Validation loss = 1.7593958377838135
Validation loss = 1.8218721151351929
Validation loss = 1.7521417140960693
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 1.5980160236358643
Validation loss = 1.8475298881530762
Validation loss = 1.8909149169921875
Validation loss = 1.8792238235473633
Validation loss = 1.932153344154358
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 1.5628741979599
Validation loss = 1.6211727857589722
Validation loss = 1.7460007667541504
Validation loss = 1.7697138786315918
Validation loss = 1.7473444938659668
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -289     |
| Iteration     | 3        |
| MaximumReturn | -255     |
| MinimumReturn | -321     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 1.5709025859832764
Validation loss = 1.9250189065933228
Validation loss = 1.989729642868042
Validation loss = 1.998664140701294
Validation loss = 1.9684747457504272
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 1.4375150203704834
Validation loss = 1.85699462890625
Validation loss = 1.8592157363891602
Validation loss = 1.8723371028900146
Validation loss = 1.8789188861846924
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 1.3546900749206543
Validation loss = 1.565879464149475
Validation loss = 1.6556930541992188
Validation loss = 1.6368334293365479
Validation loss = 1.6627804040908813
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 1.6040306091308594
Validation loss = 1.8336023092269897
Validation loss = 1.7445207834243774
Validation loss = 1.7938950061798096
Validation loss = 1.698256492614746
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 1.5272011756896973
Validation loss = 1.610107660293579
Validation loss = 1.6549819707870483
Validation loss = 1.6415300369262695
Validation loss = 1.631511926651001
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -433     |
| Iteration     | 4        |
| MaximumReturn | -265     |
| MinimumReturn | -546     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 1.4726743698120117
Validation loss = 1.5937374830245972
Validation loss = 1.6153173446655273
Validation loss = 1.5630484819412231
Validation loss = 1.5244978666305542
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 1.398279070854187
Validation loss = 1.515145182609558
Validation loss = 1.5401335954666138
Validation loss = 1.5154322385787964
Validation loss = 1.5191874504089355
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 1.2414896488189697
Validation loss = 1.2935951948165894
Validation loss = 1.3429259061813354
Validation loss = 1.3208694458007812
Validation loss = 1.388454794883728
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 1.3685431480407715
Validation loss = 1.4479848146438599
Validation loss = 1.407662272453308
Validation loss = 1.4364417791366577
Validation loss = 1.387022614479065
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 1.2765077352523804
Validation loss = 1.3749090433120728
Validation loss = 1.3759442567825317
Validation loss = 1.3653737306594849
Validation loss = 1.4644360542297363
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -414     |
| Iteration     | 5        |
| MaximumReturn | -204     |
| MinimumReturn | -591     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.19992175698280334
Validation loss = 0.17732976377010345
Validation loss = 0.17524047195911407
Validation loss = 0.17321184277534485
Validation loss = 0.17623324692249298
Validation loss = 0.1773158460855484
Validation loss = 0.17402265965938568
Validation loss = 0.1873028427362442
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.20261681079864502
Validation loss = 0.18842871487140656
Validation loss = 0.19515137374401093
Validation loss = 0.18764780461788177
Validation loss = 0.19464369118213654
Validation loss = 0.19072310626506805
Validation loss = 0.19258379936218262
Validation loss = 0.1966257095336914
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1965346336364746
Validation loss = 0.18638405203819275
Validation loss = 0.18276801705360413
Validation loss = 0.17772458493709564
Validation loss = 0.17742633819580078
Validation loss = 0.18586620688438416
Validation loss = 0.18226973712444305
Validation loss = 0.17854368686676025
Validation loss = 0.17648598551750183
Validation loss = 0.18050190806388855
Validation loss = 0.1753454953432083
Validation loss = 0.17815670371055603
Validation loss = 0.17959193885326385
Validation loss = 0.18301032483577728
Validation loss = 0.17886137962341309
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.19782279431819916
Validation loss = 0.18273425102233887
Validation loss = 0.18303360044956207
Validation loss = 0.17826935648918152
Validation loss = 0.1801479160785675
Validation loss = 0.17862460017204285
Validation loss = 0.17761635780334473
Validation loss = 0.18194694817066193
Validation loss = 0.18268822133541107
Validation loss = 0.17679989337921143
Validation loss = 0.18524615466594696
Validation loss = 0.18011341989040375
Validation loss = 0.17660625278949738
Validation loss = 0.18147695064544678
Validation loss = 0.17851369082927704
Validation loss = 0.17565514147281647
Validation loss = 0.17356924712657928
Validation loss = 0.17298296093940735
Validation loss = 0.17842303216457367
Validation loss = 0.172014981508255
Validation loss = 0.17449159920215607
Validation loss = 0.17178913950920105
Validation loss = 0.1773250550031662
Validation loss = 0.1718849241733551
Validation loss = 0.17742113769054413
Validation loss = 0.16940537095069885
Validation loss = 0.16985826194286346
Validation loss = 0.17494921386241913
Validation loss = 0.16875794529914856
Validation loss = 0.1663772612810135
Validation loss = 0.1689477413892746
Validation loss = 0.17296983301639557
Validation loss = 0.17010509967803955
Validation loss = 0.16541920602321625
Validation loss = 0.17185400426387787
Validation loss = 0.16844137012958527
Validation loss = 0.163290873169899
Validation loss = 0.1616826355457306
Validation loss = 0.1612742692232132
Validation loss = 0.1708221584558487
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.19191399216651917
Validation loss = 0.18025162816047668
Validation loss = 0.174507275223732
Validation loss = 0.176115483045578
Validation loss = 0.17983338236808777
Validation loss = 0.18437395989894867
Validation loss = 0.17800560593605042
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -367     |
| Iteration     | 6        |
| MaximumReturn | 57.5     |
| MinimumReturn | -601     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1390816569328308
Validation loss = 0.13322581350803375
Validation loss = 0.13213033974170685
Validation loss = 0.12854832410812378
Validation loss = 0.12935662269592285
Validation loss = 0.13006830215454102
Validation loss = 0.1310662031173706
Validation loss = 0.1293005645275116
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1569412648677826
Validation loss = 0.14321735501289368
Validation loss = 0.14155206084251404
Validation loss = 0.14368480443954468
Validation loss = 0.13525351881980896
Validation loss = 0.13874483108520508
Validation loss = 0.1380835473537445
Validation loss = 0.13878053426742554
Validation loss = 0.13887205719947815
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1445591151714325
Validation loss = 0.12847408652305603
Validation loss = 0.1308376044034958
Validation loss = 0.1319713294506073
Validation loss = 0.1327088475227356
Validation loss = 0.13699252903461456
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1417476236820221
Validation loss = 0.12628787755966187
Validation loss = 0.124489925801754
Validation loss = 0.1254846453666687
Validation loss = 0.12575435638427734
Validation loss = 0.12349351495504379
Validation loss = 0.1245938241481781
Validation loss = 0.12384618818759918
Validation loss = 0.126563161611557
Validation loss = 0.1260656714439392
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1489226222038269
Validation loss = 0.13320118188858032
Validation loss = 0.13984203338623047
Validation loss = 0.13387170433998108
Validation loss = 0.13259457051753998
Validation loss = 0.13526210188865662
Validation loss = 0.1361636519432068
Validation loss = 0.13272503018379211
Validation loss = 0.12952904403209686
Validation loss = 0.1340842843055725
Validation loss = 0.13411962985992432
Validation loss = 0.13110914826393127
Validation loss = 0.13147974014282227
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -359     |
| Iteration     | 7        |
| MaximumReturn | 109      |
| MinimumReturn | -723     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1271573007106781
Validation loss = 0.12002266943454742
Validation loss = 0.12199768424034119
Validation loss = 0.11903851479291916
Validation loss = 0.11913670599460602
Validation loss = 0.11818668246269226
Validation loss = 0.11991119384765625
Validation loss = 0.11907967180013657
Validation loss = 0.11968108266592026
Validation loss = 0.11707781255245209
Validation loss = 0.1187627986073494
Validation loss = 0.1180046796798706
Validation loss = 0.11778590083122253
Validation loss = 0.11841151118278503
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.13358692824840546
Validation loss = 0.12445228546857834
Validation loss = 0.12240881472826004
Validation loss = 0.12159155309200287
Validation loss = 0.12260525673627853
Validation loss = 0.12160152941942215
Validation loss = 0.12354564666748047
Validation loss = 0.1214219480752945
Validation loss = 0.12058191746473312
Validation loss = 0.12020933628082275
Validation loss = 0.12210778146982193
Validation loss = 0.12073110044002533
Validation loss = 0.12236116826534271
Validation loss = 0.11770299822092056
Validation loss = 0.12023094296455383
Validation loss = 0.11802650988101959
Validation loss = 0.12060197442770004
Validation loss = 0.11785467714071274
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1282852292060852
Validation loss = 0.1208919808268547
Validation loss = 0.12032525986433029
Validation loss = 0.12063455581665039
Validation loss = 0.11805073171854019
Validation loss = 0.12240592390298843
Validation loss = 0.118743896484375
Validation loss = 0.11730030924081802
Validation loss = 0.1172746941447258
Validation loss = 0.11798077076673508
Validation loss = 0.11724644154310226
Validation loss = 0.12020458281040192
Validation loss = 0.11697237938642502
Validation loss = 0.11442328989505768
Validation loss = 0.115012526512146
Validation loss = 0.11737334728240967
Validation loss = 0.11434904485940933
Validation loss = 0.1148289144039154
Validation loss = 0.11669056862592697
Validation loss = 0.11701250076293945
Validation loss = 0.11642777919769287
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12521375715732574
Validation loss = 0.1162855327129364
Validation loss = 0.11494260281324387
Validation loss = 0.11500933766365051
Validation loss = 0.11574538797140121
Validation loss = 0.11852239072322845
Validation loss = 0.11468943953514099
Validation loss = 0.11582352221012115
Validation loss = 0.11455556005239487
Validation loss = 0.1157279908657074
Validation loss = 0.11649195104837418
Validation loss = 0.11537958681583405
Validation loss = 0.11383303254842758
Validation loss = 0.1129179522395134
Validation loss = 0.1149890273809433
Validation loss = 0.11288588494062424
Validation loss = 0.11292055994272232
Validation loss = 0.1134137213230133
Validation loss = 0.11336644738912582
Validation loss = 0.11358605325222015
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13202625513076782
Validation loss = 0.12143468856811523
Validation loss = 0.12038523703813553
Validation loss = 0.11949583888053894
Validation loss = 0.11967898905277252
Validation loss = 0.11904481798410416
Validation loss = 0.12042106688022614
Validation loss = 0.11939749121665955
Validation loss = 0.12178944051265717
Validation loss = 0.11926960945129395
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -280     |
| Iteration     | 8        |
| MaximumReturn | 55.1     |
| MinimumReturn | -609     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12099453061819077
Validation loss = 0.11420537531375885
Validation loss = 0.1147972121834755
Validation loss = 0.11464045941829681
Validation loss = 0.11296017467975616
Validation loss = 0.11363768577575684
Validation loss = 0.1140241026878357
Validation loss = 0.11304988712072372
Validation loss = 0.11366100609302521
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12176527082920074
Validation loss = 0.11546967923641205
Validation loss = 0.11344727128744125
Validation loss = 0.11364389955997467
Validation loss = 0.1133689433336258
Validation loss = 0.11485937982797623
Validation loss = 0.11468897759914398
Validation loss = 0.11453358829021454
Validation loss = 0.11434062570333481
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12171497195959091
Validation loss = 0.11274894326925278
Validation loss = 0.11302715539932251
Validation loss = 0.11270995438098907
Validation loss = 0.11313780397176743
Validation loss = 0.11383514106273651
Validation loss = 0.11265777051448822
Validation loss = 0.11363708972930908
Validation loss = 0.11434604227542877
Validation loss = 0.1152503490447998
Validation loss = 0.11123188585042953
Validation loss = 0.11132857948541641
Validation loss = 0.11370304971933365
Validation loss = 0.11128289997577667
Validation loss = 0.11574665457010269
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11756475269794464
Validation loss = 0.11249339580535889
Validation loss = 0.11152440309524536
Validation loss = 0.11244605481624603
Validation loss = 0.10933903604745865
Validation loss = 0.11141999810934067
Validation loss = 0.11296351999044418
Validation loss = 0.11314520984888077
Validation loss = 0.11181966960430145
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12275953590869904
Validation loss = 0.11336059868335724
Validation loss = 0.11484651267528534
Validation loss = 0.11470208317041397
Validation loss = 0.11559130996465683
Validation loss = 0.11599556356668472
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -410     |
| Iteration     | 9        |
| MaximumReturn | -144     |
| MinimumReturn | -539     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11259085685014725
Validation loss = 0.10795670002698898
Validation loss = 0.10518049448728561
Validation loss = 0.10812198370695114
Validation loss = 0.10810408741235733
Validation loss = 0.10736045986413956
Validation loss = 0.10808088630437851
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11055471748113632
Validation loss = 0.10694923251867294
Validation loss = 0.10879525542259216
Validation loss = 0.1070946455001831
Validation loss = 0.10631103068590164
Validation loss = 0.10903943330049515
Validation loss = 0.10714273154735565
Validation loss = 0.10672067850828171
Validation loss = 0.10846471041440964
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11265513300895691
Validation loss = 0.10557501018047333
Validation loss = 0.10766612738370895
Validation loss = 0.10738610476255417
Validation loss = 0.10684408992528915
Validation loss = 0.10676433891057968
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11314702033996582
Validation loss = 0.10562476515769958
Validation loss = 0.10446760803461075
Validation loss = 0.10520503669977188
Validation loss = 0.10562930256128311
Validation loss = 0.10591878741979599
Validation loss = 0.10539562255144119
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11801762133836746
Validation loss = 0.10951472073793411
Validation loss = 0.10812509059906006
Validation loss = 0.10796423256397247
Validation loss = 0.10743936896324158
Validation loss = 0.10762779414653778
Validation loss = 0.10868553817272186
Validation loss = 0.10948171466588974
Validation loss = 0.1076468676328659
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -152     |
| Iteration     | 10       |
| MaximumReturn | 271      |
| MinimumReturn | -777     |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10817747563123703
Validation loss = 0.10531008243560791
Validation loss = 0.10474523156881332
Validation loss = 0.10626894980669022
Validation loss = 0.10537055879831314
Validation loss = 0.10629558563232422
Validation loss = 0.10404035449028015
Validation loss = 0.10590264946222305
Validation loss = 0.10539361089468002
Validation loss = 0.10661349445581436
Validation loss = 0.10593218356370926
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10888686031103134
Validation loss = 0.10539665073156357
Validation loss = 0.10724490880966187
Validation loss = 0.10486923903226852
Validation loss = 0.10594546794891357
Validation loss = 0.10673731565475464
Validation loss = 0.10561662167310715
Validation loss = 0.10592298954725266
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1101137027144432
Validation loss = 0.10645385831594467
Validation loss = 0.10795334726572037
Validation loss = 0.10517260432243347
Validation loss = 0.10869810730218887
Validation loss = 0.10704445093870163
Validation loss = 0.10685068368911743
Validation loss = 0.10576201230287552
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10906758159399033
Validation loss = 0.10431278496980667
Validation loss = 0.1049450933933258
Validation loss = 0.10457541793584824
Validation loss = 0.1042793020606041
Validation loss = 0.10476290434598923
Validation loss = 0.10564611107110977
Validation loss = 0.10497484356164932
Validation loss = 0.10411033779382706
Validation loss = 0.10419583320617676
Validation loss = 0.10570190101861954
Validation loss = 0.103617824614048
Validation loss = 0.1048034057021141
Validation loss = 0.10406380891799927
Validation loss = 0.10391318798065186
Validation loss = 0.10511454194784164
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10949426889419556
Validation loss = 0.10570713877677917
Validation loss = 0.10619831085205078
Validation loss = 0.10666701942682266
Validation loss = 0.1066267117857933
Validation loss = 0.1056913360953331
Validation loss = 0.10769017785787582
Validation loss = 0.10551991313695908
Validation loss = 0.10611621290445328
Validation loss = 0.10833344608545303
Validation loss = 0.10734499245882034
Validation loss = 0.10777062177658081
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -177     |
| Iteration     | 11       |
| MaximumReturn | 700      |
| MinimumReturn | -621     |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10938800126314163
Validation loss = 0.10498269647359848
Validation loss = 0.10398268699645996
Validation loss = 0.10530316829681396
Validation loss = 0.10495531558990479
Validation loss = 0.1057482659816742
Validation loss = 0.10542209446430206
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1117769107222557
Validation loss = 0.10454630851745605
Validation loss = 0.10449797660112381
Validation loss = 0.10554849356412888
Validation loss = 0.10528747737407684
Validation loss = 0.10513428598642349
Validation loss = 0.10411560535430908
Validation loss = 0.10451904684305191
Validation loss = 0.10473838448524475
Validation loss = 0.10437507182359695
Validation loss = 0.10327960550785065
Validation loss = 0.10426737368106842
Validation loss = 0.10509014129638672
Validation loss = 0.10550690442323685
Validation loss = 0.10362441837787628
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1130150705575943
Validation loss = 0.10448132455348969
Validation loss = 0.10413201153278351
Validation loss = 0.10575804114341736
Validation loss = 0.10518769919872284
Validation loss = 0.10564009845256805
Validation loss = 0.10532942414283752
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1098797619342804
Validation loss = 0.10220224410295486
Validation loss = 0.10432785004377365
Validation loss = 0.10566060990095139
Validation loss = 0.1040182113647461
Validation loss = 0.10416226089000702
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11086385697126389
Validation loss = 0.10465565323829651
Validation loss = 0.10344146192073822
Validation loss = 0.10346947610378265
Validation loss = 0.10363249480724335
Validation loss = 0.10614310204982758
Validation loss = 0.10583920031785965
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -109     |
| Iteration     | 12       |
| MaximumReturn | 544      |
| MinimumReturn | -561     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10880788415670395
Validation loss = 0.10437185317277908
Validation loss = 0.10605388134717941
Validation loss = 0.10442868620157242
Validation loss = 0.1073065847158432
Validation loss = 0.1057502031326294
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11428237706422806
Validation loss = 0.105972521007061
Validation loss = 0.10453806072473526
Validation loss = 0.10408075898885727
Validation loss = 0.10502403229475021
Validation loss = 0.10453469306230545
Validation loss = 0.10535115748643875
Validation loss = 0.10469815880060196
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11376936733722687
Validation loss = 0.10507970303297043
Validation loss = 0.10541994124650955
Validation loss = 0.10741019994020462
Validation loss = 0.10610916465520859
Validation loss = 0.10799132287502289
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11309947818517685
Validation loss = 0.10619660466909409
Validation loss = 0.10457633435726166
Validation loss = 0.10666834563016891
Validation loss = 0.10456179827451706
Validation loss = 0.10588955134153366
Validation loss = 0.10672573000192642
Validation loss = 0.10471894592046738
Validation loss = 0.10476290434598923
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11133730411529541
Validation loss = 0.10678607225418091
Validation loss = 0.1074870228767395
Validation loss = 0.1072533130645752
Validation loss = 0.1050594374537468
Validation loss = 0.10662227123975754
Validation loss = 0.1068165972828865
Validation loss = 0.10617629438638687
Validation loss = 0.10660471022129059
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -68      |
| Iteration     | 13       |
| MaximumReturn | 711      |
| MinimumReturn | -365     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10829002410173416
Validation loss = 0.10398940742015839
Validation loss = 0.10636835545301437
Validation loss = 0.10615171492099762
Validation loss = 0.10435785353183746
Validation loss = 0.10531111061573029
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10775449872016907
Validation loss = 0.10397838056087494
Validation loss = 0.10330014675855637
Validation loss = 0.10451396554708481
Validation loss = 0.10621480643749237
Validation loss = 0.10395359247922897
Validation loss = 0.10395296663045883
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10807646811008453
Validation loss = 0.10504277795553207
Validation loss = 0.10472039133310318
Validation loss = 0.10610918700695038
Validation loss = 0.1060030460357666
Validation loss = 0.10498145967721939
Validation loss = 0.10608085244894028
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10941363871097565
Validation loss = 0.10372896492481232
Validation loss = 0.10410819947719574
Validation loss = 0.10528942197561264
Validation loss = 0.10487910360097885
Validation loss = 0.10593673586845398
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10919024795293808
Validation loss = 0.10373149067163467
Validation loss = 0.10543732345104218
Validation loss = 0.10664588958024979
Validation loss = 0.10545022040605545
Validation loss = 0.10482404381036758
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -370     |
| Iteration     | 14       |
| MaximumReturn | 137      |
| MinimumReturn | -608     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11161977052688599
Validation loss = 0.10323002934455872
Validation loss = 0.10497398674488068
Validation loss = 0.10598190128803253
Validation loss = 0.10618719458580017
Validation loss = 0.1056538000702858
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11547504365444183
Validation loss = 0.10427650809288025
Validation loss = 0.10468350350856781
Validation loss = 0.10513144731521606
Validation loss = 0.10586729645729065
Validation loss = 0.10785779356956482
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11340652406215668
Validation loss = 0.10570785403251648
Validation loss = 0.1049174889922142
Validation loss = 0.10565526783466339
Validation loss = 0.10600633174180984
Validation loss = 0.10711263120174408
Validation loss = 0.10477505624294281
Validation loss = 0.1067834347486496
Validation loss = 0.10748447477817535
Validation loss = 0.10599806159734726
Validation loss = 0.10502012073993683
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11245765537023544
Validation loss = 0.10355208814144135
Validation loss = 0.10548602789640427
Validation loss = 0.10557876527309418
Validation loss = 0.10595139861106873
Validation loss = 0.1054307073354721
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11297924816608429
Validation loss = 0.10590147227048874
Validation loss = 0.10600630939006805
Validation loss = 0.10615190863609314
Validation loss = 0.1049451231956482
Validation loss = 0.10556266456842422
Validation loss = 0.10710065066814423
Validation loss = 0.10693161189556122
Validation loss = 0.10597538203001022
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 257      |
| Iteration     | 15       |
| MaximumReturn | 686      |
| MinimumReturn | -516     |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11011625081300735
Validation loss = 0.1034969687461853
Validation loss = 0.1041979193687439
Validation loss = 0.10448406636714935
Validation loss = 0.10379862785339355
Validation loss = 0.10401337593793869
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11045116931200027
Validation loss = 0.1046520322561264
Validation loss = 0.10364007204771042
Validation loss = 0.10362791270017624
Validation loss = 0.10386818647384644
Validation loss = 0.10394518077373505
Validation loss = 0.10460302233695984
Validation loss = 0.10440417379140854
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10908670723438263
Validation loss = 0.10419206321239471
Validation loss = 0.10417148470878601
Validation loss = 0.10454550385475159
Validation loss = 0.10470491647720337
Validation loss = 0.10473886877298355
Validation loss = 0.10523684322834015
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11029917001724243
Validation loss = 0.10250680893659592
Validation loss = 0.10356221348047256
Validation loss = 0.1052694320678711
Validation loss = 0.10462731868028641
Validation loss = 0.10447870194911957
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1108107939362526
Validation loss = 0.10313070565462112
Validation loss = 0.1028342917561531
Validation loss = 0.10315359383821487
Validation loss = 0.1032957211136818
Validation loss = 0.10361739248037338
Validation loss = 0.10683393478393555
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -9.07    |
| Iteration     | 16       |
| MaximumReturn | 962      |
| MinimumReturn | -449     |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10960830748081207
Validation loss = 0.10682349652051926
Validation loss = 0.1066729947924614
Validation loss = 0.10552732646465302
Validation loss = 0.10417398810386658
Validation loss = 0.10686293989419937
Validation loss = 0.10494332015514374
Validation loss = 0.1069628894329071
Validation loss = 0.10673284530639648
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11015299707651138
Validation loss = 0.10487940907478333
Validation loss = 0.10626218467950821
Validation loss = 0.10567954182624817
Validation loss = 0.10536136478185654
Validation loss = 0.1061084121465683
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10852880775928497
Validation loss = 0.10501015186309814
Validation loss = 0.10558664798736572
Validation loss = 0.10452471673488617
Validation loss = 0.10588258504867554
Validation loss = 0.10607177019119263
Validation loss = 0.10491721332073212
Validation loss = 0.10624829679727554
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10979331284761429
Validation loss = 0.10434321314096451
Validation loss = 0.10682731121778488
Validation loss = 0.10589496791362762
Validation loss = 0.10764069110155106
Validation loss = 0.10641465336084366
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10967410355806351
Validation loss = 0.10493841767311096
Validation loss = 0.10609253495931625
Validation loss = 0.1068761795759201
Validation loss = 0.10483137518167496
Validation loss = 0.1070101410150528
Validation loss = 0.10639955848455429
Validation loss = 0.1054622232913971
Validation loss = 0.10450340062379837
Validation loss = 0.10735862702131271
Validation loss = 0.1048208475112915
Validation loss = 0.10681256651878357
Validation loss = 0.10496786236763
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -36.5    |
| Iteration     | 17       |
| MaximumReturn | 453      |
| MinimumReturn | -338     |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1095980629324913
Validation loss = 0.1045113056898117
Validation loss = 0.10482686758041382
Validation loss = 0.10639289766550064
Validation loss = 0.10488751530647278
Validation loss = 0.1057133823633194
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11133634299039841
Validation loss = 0.10341262072324753
Validation loss = 0.10404176265001297
Validation loss = 0.10480358451604843
Validation loss = 0.10366687923669815
Validation loss = 0.10425088554620743
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11123815178871155
Validation loss = 0.10395845025777817
Validation loss = 0.1048673540353775
Validation loss = 0.10546610504388809
Validation loss = 0.10410942882299423
Validation loss = 0.10490963608026505
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11012629419565201
Validation loss = 0.10486266016960144
Validation loss = 0.10558685660362244
Validation loss = 0.10575125366449356
Validation loss = 0.10705694556236267
Validation loss = 0.10600489377975464
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11096940189599991
Validation loss = 0.10448737442493439
Validation loss = 0.10583224892616272
Validation loss = 0.10487748682498932
Validation loss = 0.10495351254940033
Validation loss = 0.10406530648469925
Validation loss = 0.10582073032855988
Validation loss = 0.10348919034004211
Validation loss = 0.10485590249300003
Validation loss = 0.1060662642121315
Validation loss = 0.10543692857027054
Validation loss = 0.10498248040676117
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 429      |
| Iteration     | 18       |
| MaximumReturn | 941      |
| MinimumReturn | -303     |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10717997699975967
Validation loss = 0.1032135933637619
Validation loss = 0.10311820358037949
Validation loss = 0.10391362756490707
Validation loss = 0.10451078414916992
Validation loss = 0.10336650907993317
Validation loss = 0.10317011177539825
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10582003742456436
Validation loss = 0.10217414051294327
Validation loss = 0.10220463573932648
Validation loss = 0.10162894427776337
Validation loss = 0.10195431858301163
Validation loss = 0.10315249115228653
Validation loss = 0.10273005813360214
Validation loss = 0.10131146758794785
Validation loss = 0.10312950611114502
Validation loss = 0.10192958265542984
Validation loss = 0.10221196711063385
Validation loss = 0.10186181962490082
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10685000568628311
Validation loss = 0.10249876976013184
Validation loss = 0.10200176388025284
Validation loss = 0.10263945907354355
Validation loss = 0.10241882503032684
Validation loss = 0.102320596575737
Validation loss = 0.10243304073810577
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10905454307794571
Validation loss = 0.10507204383611679
Validation loss = 0.10336490720510483
Validation loss = 0.10393504798412323
Validation loss = 0.10409102588891983
Validation loss = 0.10469523817300797
Validation loss = 0.10428296029567719
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11002278327941895
Validation loss = 0.10167143493890762
Validation loss = 0.10413660854101181
Validation loss = 0.10315519571304321
Validation loss = 0.10400986671447754
Validation loss = 0.10205963999032974
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -82.8    |
| Iteration     | 19       |
| MaximumReturn | 352      |
| MinimumReturn | -519     |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1048763170838356
Validation loss = 0.0998942032456398
Validation loss = 0.10245438665151596
Validation loss = 0.10298598557710648
Validation loss = 0.10168543457984924
Validation loss = 0.10267919301986694
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10859706252813339
Validation loss = 0.10157249867916107
Validation loss = 0.10066808015108109
Validation loss = 0.10190378129482269
Validation loss = 0.10349857062101364
Validation loss = 0.10230201482772827
Validation loss = 0.10175949335098267
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10586448013782501
Validation loss = 0.10151947289705276
Validation loss = 0.10161365568637848
Validation loss = 0.1023622453212738
Validation loss = 0.10342071205377579
Validation loss = 0.10407672822475433
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10779690742492676
Validation loss = 0.10172445327043533
Validation loss = 0.1044967994093895
Validation loss = 0.10512355715036392
Validation loss = 0.1038358211517334
Validation loss = 0.10343650728464127
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10591543465852737
Validation loss = 0.10092578828334808
Validation loss = 0.10227352380752563
Validation loss = 0.10271815210580826
Validation loss = 0.10152571648359299
Validation loss = 0.1031186431646347
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 178      |
| Iteration     | 20       |
| MaximumReturn | 1.37e+03 |
| MinimumReturn | -490     |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10635235160589218
Validation loss = 0.10044842213392258
Validation loss = 0.10245674103498459
Validation loss = 0.10177857428789139
Validation loss = 0.10183660686016083
Validation loss = 0.10097121447324753
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1054660975933075
Validation loss = 0.09989365935325623
Validation loss = 0.10252738744020462
Validation loss = 0.10236291587352753
Validation loss = 0.10132651776075363
Validation loss = 0.09989205002784729
Validation loss = 0.10082197189331055
Validation loss = 0.10096605122089386
Validation loss = 0.10150137543678284
Validation loss = 0.10003101080656052
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10442154854536057
Validation loss = 0.10086575150489807
Validation loss = 0.10132665187120438
Validation loss = 0.10150135308504105
Validation loss = 0.1040157601237297
Validation loss = 0.1026516929268837
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10753551125526428
Validation loss = 0.10387475043535233
Validation loss = 0.10372808575630188
Validation loss = 0.10310674458742142
Validation loss = 0.10302569717168808
Validation loss = 0.10304185748100281
Validation loss = 0.10519568622112274
Validation loss = 0.10267270356416702
Validation loss = 0.10483012348413467
Validation loss = 0.10274367034435272
Validation loss = 0.10293705016374588
Validation loss = 0.10456671565771103
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10482002049684525
Validation loss = 0.10135245323181152
Validation loss = 0.10215885937213898
Validation loss = 0.10290230065584183
Validation loss = 0.10156838595867157
Validation loss = 0.10347235202789307
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 388      |
| Iteration     | 21       |
| MaximumReturn | 1.1e+03  |
| MinimumReturn | -449     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10712113231420517
Validation loss = 0.09988901764154434
Validation loss = 0.10004860162734985
Validation loss = 0.10164234042167664
Validation loss = 0.10080989450216293
Validation loss = 0.10143397003412247
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10510792583227158
Validation loss = 0.10011148452758789
Validation loss = 0.10109531134366989
Validation loss = 0.1002998799085617
Validation loss = 0.10095826536417007
Validation loss = 0.10131020843982697
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10578060150146484
Validation loss = 0.10053340345621109
Validation loss = 0.1008838340640068
Validation loss = 0.10029107332229614
Validation loss = 0.10241097956895828
Validation loss = 0.10086426883935928
Validation loss = 0.10212066769599915
Validation loss = 0.10090476274490356
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10693123191595078
Validation loss = 0.10214969515800476
Validation loss = 0.10297802090644836
Validation loss = 0.10304544121026993
Validation loss = 0.10320261120796204
Validation loss = 0.10229633003473282
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1066809669137001
Validation loss = 0.10190001875162125
Validation loss = 0.1020381897687912
Validation loss = 0.1019524559378624
Validation loss = 0.10270092636346817
Validation loss = 0.10189962387084961
Validation loss = 0.10129808634519577
Validation loss = 0.10142725706100464
Validation loss = 0.10170380771160126
Validation loss = 0.10184729844331741
Validation loss = 0.1026671975851059
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 232      |
| Iteration     | 22       |
| MaximumReturn | 472      |
| MinimumReturn | -423     |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10332828760147095
Validation loss = 0.10008152574300766
Validation loss = 0.10027450323104858
Validation loss = 0.10169503837823868
Validation loss = 0.10058077424764633
Validation loss = 0.10190532356500626
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10354868322610855
Validation loss = 0.09999122470617294
Validation loss = 0.10051089525222778
Validation loss = 0.10090852528810501
Validation loss = 0.10062750428915024
Validation loss = 0.10104519128799438
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10437767952680588
Validation loss = 0.09931502491235733
Validation loss = 0.10018450766801834
Validation loss = 0.10001793503761292
Validation loss = 0.10131305456161499
Validation loss = 0.10125607252120972
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10626135021448135
Validation loss = 0.10215119272470474
Validation loss = 0.1013636365532875
Validation loss = 0.10212872177362442
Validation loss = 0.10242471843957901
Validation loss = 0.10264164209365845
Validation loss = 0.10213321447372437
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10541189461946487
Validation loss = 0.1012456938624382
Validation loss = 0.10230199247598648
Validation loss = 0.10447073727846146
Validation loss = 0.10199443250894547
Validation loss = 0.10166791826486588
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 294      |
| Iteration     | 23       |
| MaximumReturn | 2.16e+03 |
| MinimumReturn | -597     |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10651213675737381
Validation loss = 0.09957493841648102
Validation loss = 0.10161127150058746
Validation loss = 0.1006237119436264
Validation loss = 0.10030955076217651
Validation loss = 0.10154760628938675
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10457673668861389
Validation loss = 0.09995236992835999
Validation loss = 0.09999720752239227
Validation loss = 0.09937799721956253
Validation loss = 0.10025205463171005
Validation loss = 0.1011933833360672
Validation loss = 0.0998590961098671
Validation loss = 0.10019120573997498
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10358855128288269
Validation loss = 0.09910134226083755
Validation loss = 0.10148270428180695
Validation loss = 0.0995541512966156
Validation loss = 0.10001949220895767
Validation loss = 0.10110007226467133
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10462252795696259
Validation loss = 0.1025751605629921
Validation loss = 0.10307386517524719
Validation loss = 0.1025974377989769
Validation loss = 0.10157351195812225
Validation loss = 0.10156850516796112
Validation loss = 0.1023908331990242
Validation loss = 0.10046735405921936
Validation loss = 0.10202228277921677
Validation loss = 0.10299775004386902
Validation loss = 0.10228575766086578
Validation loss = 0.10226477682590485
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10292701423168182
Validation loss = 0.10025232285261154
Validation loss = 0.1011703759431839
Validation loss = 0.1021047979593277
Validation loss = 0.1014857068657875
Validation loss = 0.10284469276666641
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 842      |
| Iteration     | 24       |
| MaximumReturn | 1.62e+03 |
| MinimumReturn | -160     |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10635749250650406
Validation loss = 0.1000567376613617
Validation loss = 0.10136734694242477
Validation loss = 0.10069085657596588
Validation loss = 0.1009833812713623
Validation loss = 0.10126692801713943
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10535894334316254
Validation loss = 0.10093624144792557
Validation loss = 0.10094135254621506
Validation loss = 0.10116075724363327
Validation loss = 0.10097470879554749
Validation loss = 0.10249795019626617
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10397124290466309
Validation loss = 0.10099410265684128
Validation loss = 0.10014179348945618
Validation loss = 0.10183330625295639
Validation loss = 0.10116098821163177
Validation loss = 0.10165026783943176
Validation loss = 0.10089243948459625
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10659731924533844
Validation loss = 0.10286712646484375
Validation loss = 0.10377364605665207
Validation loss = 0.1026502326130867
Validation loss = 0.10383197665214539
Validation loss = 0.10364467650651932
Validation loss = 0.10326652228832245
Validation loss = 0.1038130670785904
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1057160347700119
Validation loss = 0.1025942862033844
Validation loss = 0.10393305867910385
Validation loss = 0.10311416536569595
Validation loss = 0.10185416787862778
Validation loss = 0.10266672819852829
Validation loss = 0.10138718783855438
Validation loss = 0.10325173288583755
Validation loss = 0.1019626334309578
Validation loss = 0.10256847739219666
Validation loss = 0.1024869978427887
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 296      |
| Iteration     | 25       |
| MaximumReturn | 1.53e+03 |
| MinimumReturn | -527     |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10471094399690628
Validation loss = 0.10132626444101334
Validation loss = 0.10139154642820358
Validation loss = 0.10399474948644638
Validation loss = 0.10176414996385574
Validation loss = 0.1027899831533432
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10474259406328201
Validation loss = 0.10110625624656677
Validation loss = 0.10220756381750107
Validation loss = 0.1022462472319603
Validation loss = 0.1025935485959053
Validation loss = 0.10130710154771805
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10387562960386276
Validation loss = 0.10068856924772263
Validation loss = 0.10127588361501694
Validation loss = 0.10200253129005432
Validation loss = 0.10180748254060745
Validation loss = 0.10201498866081238
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10631541907787323
Validation loss = 0.10405176877975464
Validation loss = 0.10407576709985733
Validation loss = 0.10402306169271469
Validation loss = 0.10328462719917297
Validation loss = 0.10373956710100174
Validation loss = 0.10448252409696579
Validation loss = 0.10284500569105148
Validation loss = 0.10427962988615036
Validation loss = 0.10557811707258224
Validation loss = 0.10453378409147263
Validation loss = 0.10278770327568054
Validation loss = 0.10311532765626907
Validation loss = 0.10380701720714569
Validation loss = 0.10350697487592697
Validation loss = 0.10444816201925278
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10617642849683762
Validation loss = 0.10204761475324631
Validation loss = 0.10345644503831863
Validation loss = 0.1032617911696434
Validation loss = 0.10294922441244125
Validation loss = 0.10339366644620895
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 618      |
| Iteration     | 26       |
| MaximumReturn | 1.42e+03 |
| MinimumReturn | -338     |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10346801578998566
Validation loss = 0.1006745770573616
Validation loss = 0.10138627141714096
Validation loss = 0.1009334921836853
Validation loss = 0.10154575854539871
Validation loss = 0.10156725347042084
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10377099364995956
Validation loss = 0.10103739798069
Validation loss = 0.10351778566837311
Validation loss = 0.10177140682935715
Validation loss = 0.1018611416220665
Validation loss = 0.10142786055803299
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1034514531493187
Validation loss = 0.10187332332134247
Validation loss = 0.10155542939901352
Validation loss = 0.10283632576465607
Validation loss = 0.10054104030132294
Validation loss = 0.10156454145908356
Validation loss = 0.10246530920267105
Validation loss = 0.10014783591032028
Validation loss = 0.10226350277662277
Validation loss = 0.101338230073452
Validation loss = 0.10055519640445709
Validation loss = 0.1023830771446228
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10809261351823807
Validation loss = 0.10289476811885834
Validation loss = 0.10344105213880539
Validation loss = 0.10487689077854156
Validation loss = 0.1039547249674797
Validation loss = 0.10446751117706299
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10624893754720688
Validation loss = 0.10211703926324844
Validation loss = 0.10321666300296783
Validation loss = 0.10208380967378616
Validation loss = 0.10237635672092438
Validation loss = 0.10301356762647629
Validation loss = 0.103059783577919
Validation loss = 0.10230199247598648
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -98.5    |
| Iteration     | 27       |
| MaximumReturn | 368      |
| MinimumReturn | -521     |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10417608916759491
Validation loss = 0.09981361031532288
Validation loss = 0.10044095665216446
Validation loss = 0.10108399391174316
Validation loss = 0.10063362866640091
Validation loss = 0.1006324514746666
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10470154881477356
Validation loss = 0.10048820078372955
Validation loss = 0.09917610138654709
Validation loss = 0.10101868212223053
Validation loss = 0.10191614180803299
Validation loss = 0.10282565653324127
Validation loss = 0.10084321349859238
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10291169583797455
Validation loss = 0.10000655800104141
Validation loss = 0.09980706870555878
Validation loss = 0.10033116489648819
Validation loss = 0.10028311610221863
Validation loss = 0.09951470047235489
Validation loss = 0.09947094321250916
Validation loss = 0.1004878357052803
Validation loss = 0.10105859488248825
Validation loss = 0.10008621215820312
Validation loss = 0.10050392150878906
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10567042976617813
Validation loss = 0.10197242349386215
Validation loss = 0.10288678109645844
Validation loss = 0.10273005813360214
Validation loss = 0.10390011966228485
Validation loss = 0.10192681849002838
Validation loss = 0.10218668729066849
Validation loss = 0.10203277319669724
Validation loss = 0.10265395045280457
Validation loss = 0.10328224301338196
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10415289551019669
Validation loss = 0.10033624619245529
Validation loss = 0.10145397484302521
Validation loss = 0.10200398415327072
Validation loss = 0.10263578593730927
Validation loss = 0.10107357800006866
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 170      |
| Iteration     | 28       |
| MaximumReturn | 1.04e+03 |
| MinimumReturn | -541     |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1057281494140625
Validation loss = 0.10207566618919373
Validation loss = 0.1007053479552269
Validation loss = 0.1015791967511177
Validation loss = 0.10197771340608597
Validation loss = 0.10144667327404022
Validation loss = 0.10116050392389297
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10471057891845703
Validation loss = 0.10187799483537674
Validation loss = 0.10234250128269196
Validation loss = 0.10287560522556305
Validation loss = 0.10190293192863464
Validation loss = 0.10235796868801117
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10472715646028519
Validation loss = 0.10108046978712082
Validation loss = 0.10175944864749908
Validation loss = 0.10120797157287598
Validation loss = 0.10211348533630371
Validation loss = 0.10116470605134964
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10661882907152176
Validation loss = 0.10288062691688538
Validation loss = 0.1030840128660202
Validation loss = 0.10229089856147766
Validation loss = 0.10342898219823837
Validation loss = 0.10351797193288803
Validation loss = 0.10317406058311462
Validation loss = 0.102659210562706
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10639999806880951
Validation loss = 0.10080822557210922
Validation loss = 0.10240482538938522
Validation loss = 0.10186278820037842
Validation loss = 0.10293502360582352
Validation loss = 0.10269881039857864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 758      |
| Iteration     | 29       |
| MaximumReturn | 1.82e+03 |
| MinimumReturn | -328     |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10394410043954849
Validation loss = 0.10007146745920181
Validation loss = 0.10261377692222595
Validation loss = 0.10075674951076508
Validation loss = 0.10098182410001755
Validation loss = 0.10059195011854172
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10444118082523346
Validation loss = 0.10189036279916763
Validation loss = 0.10073869675397873
Validation loss = 0.10209069401025772
Validation loss = 0.10143457353115082
Validation loss = 0.10184206813573837
Validation loss = 0.10208821296691895
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10436813533306122
Validation loss = 0.10056814551353455
Validation loss = 0.10133375972509384
Validation loss = 0.09947510063648224
Validation loss = 0.10056019574403763
Validation loss = 0.09999717772006989
Validation loss = 0.10039091855287552
Validation loss = 0.10057047009468079
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10562742501497269
Validation loss = 0.10214629769325256
Validation loss = 0.10295621305704117
Validation loss = 0.10319170355796814
Validation loss = 0.10302041471004486
Validation loss = 0.103762686252594
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1034519374370575
Validation loss = 0.10067549347877502
Validation loss = 0.10354789346456528
Validation loss = 0.10232657194137573
Validation loss = 0.10097240656614304
Validation loss = 0.10430577397346497
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 496      |
| Iteration     | 30       |
| MaximumReturn | 2.01e+03 |
| MinimumReturn | -698     |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10129185765981674
Validation loss = 0.09907743334770203
Validation loss = 0.09947004169225693
Validation loss = 0.09936267882585526
Validation loss = 0.10043598711490631
Validation loss = 0.0996723398566246
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10337256640195847
Validation loss = 0.09937401115894318
Validation loss = 0.10049061477184296
Validation loss = 0.10078993439674377
Validation loss = 0.10241572558879852
Validation loss = 0.1018139123916626
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10314235836267471
Validation loss = 0.09921074658632278
Validation loss = 0.10004627704620361
Validation loss = 0.0997912585735321
Validation loss = 0.09902030974626541
Validation loss = 0.09911322593688965
Validation loss = 0.09932418912649155
Validation loss = 0.09988638758659363
Validation loss = 0.09985099732875824
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10435252636671066
Validation loss = 0.10120344161987305
Validation loss = 0.10096447169780731
Validation loss = 0.1008378118276596
Validation loss = 0.10037745535373688
Validation loss = 0.10050395876169205
Validation loss = 0.10052131861448288
Validation loss = 0.10061739385128021
Validation loss = 0.10115649551153183
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10379946976900101
Validation loss = 0.10021945089101791
Validation loss = 0.10145500302314758
Validation loss = 0.10123991966247559
Validation loss = 0.09984953701496124
Validation loss = 0.1002369150519371
Validation loss = 0.09954006969928741
Validation loss = 0.10106775909662247
Validation loss = 0.09861880540847778
Validation loss = 0.10057161748409271
Validation loss = 0.1012859046459198
Validation loss = 0.10038618743419647
Validation loss = 0.10142864286899567
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 49.2     |
| Iteration     | 31       |
| MaximumReturn | 1.24e+03 |
| MinimumReturn | -636     |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1025160700082779
Validation loss = 0.09840439260005951
Validation loss = 0.09946943074464798
Validation loss = 0.09870626032352448
Validation loss = 0.09933053702116013
Validation loss = 0.09914331138134003
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10192333161830902
Validation loss = 0.09951887279748917
Validation loss = 0.10031770914793015
Validation loss = 0.10124311596155167
Validation loss = 0.1014275997877121
Validation loss = 0.1006266251206398
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1004643440246582
Validation loss = 0.09814941138029099
Validation loss = 0.0989905595779419
Validation loss = 0.09877976775169373
Validation loss = 0.09915703535079956
Validation loss = 0.09898680448532104
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10188758373260498
Validation loss = 0.10038910061120987
Validation loss = 0.10034726560115814
Validation loss = 0.10048461705446243
Validation loss = 0.09993713349103928
Validation loss = 0.10037854313850403
Validation loss = 0.1001540869474411
Validation loss = 0.10085064172744751
Validation loss = 0.10026181489229202
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10260774940252304
Validation loss = 0.09912222623825073
Validation loss = 0.1006385013461113
Validation loss = 0.1000814437866211
Validation loss = 0.0991726890206337
Validation loss = 0.09877416491508484
Validation loss = 0.1011638417840004
Validation loss = 0.1005963459610939
Validation loss = 0.09893109649419785
Validation loss = 0.10045163333415985
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -170     |
| Iteration     | 32       |
| MaximumReturn | 676      |
| MinimumReturn | -630     |
| TotalSamples  | 136000   |
----------------------------
