Logging to experiments/hopper/hopperA01/Mon-31-Oct-2022-11-00-29-AM-CDT_hopper_trpo_iteration_20_seed2231
Print configuration .....
{'env_name': 'hopper', 'random_seeds': [1234, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/hopper_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6656663417816162
Validation loss = 0.234880268573761
Validation loss = 0.2225661426782608
Validation loss = 0.2107771933078766
Validation loss = 0.19110837578773499
Validation loss = 0.21246761083602905
Validation loss = 0.20152655243873596
Validation loss = 0.20401631295681
Validation loss = 0.20661351084709167
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.36156949400901794
Validation loss = 0.237848162651062
Validation loss = 0.22252899408340454
Validation loss = 0.21048152446746826
Validation loss = 0.19810351729393005
Validation loss = 0.19271734356880188
Validation loss = 0.19497010111808777
Validation loss = 0.19533339142799377
Validation loss = 0.19469398260116577
Validation loss = 0.20035767555236816
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4758698344230652
Validation loss = 0.24342305958271027
Validation loss = 0.22456324100494385
Validation loss = 0.20423907041549683
Validation loss = 0.2034994661808014
Validation loss = 0.1998697817325592
Validation loss = 0.19347131252288818
Validation loss = 0.20019635558128357
Validation loss = 0.1920388787984848
Validation loss = 0.20154613256454468
Validation loss = 0.21387863159179688
Validation loss = 0.211052805185318
Validation loss = 0.216819167137146
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.42311936616897583
Validation loss = 0.24361586570739746
Validation loss = 0.22694966197013855
Validation loss = 0.21801646053791046
Validation loss = 0.19827482104301453
Validation loss = 0.20567554235458374
Validation loss = 0.1898411214351654
Validation loss = 0.2032862901687622
Validation loss = 0.19588658213615417
Validation loss = 0.20574191212654114
Validation loss = 0.19630783796310425
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.5090077519416809
Validation loss = 0.24730685353279114
Validation loss = 0.2227119505405426
Validation loss = 0.21302786469459534
Validation loss = 0.19693613052368164
Validation loss = 0.18653637170791626
Validation loss = 0.18762324750423431
Validation loss = 0.2014012485742569
Validation loss = 0.1995609700679779
Validation loss = 0.1992802768945694
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.35e+03 |
| Iteration     | 0         |
| MaximumReturn | -832      |
| MinimumReturn | -1.91e+03 |
| TotalSamples  | 8000      |
-----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.30690455436706543
Validation loss = 0.25098657608032227
Validation loss = 0.2527247667312622
Validation loss = 0.24804988503456116
Validation loss = 0.24289044737815857
Validation loss = 0.24103710055351257
Validation loss = 0.2394281029701233
Validation loss = 0.23155361413955688
Validation loss = 0.23421424627304077
Validation loss = 0.23926521837711334
Validation loss = 0.24755966663360596
Validation loss = 0.24549898505210876
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.30240532755851746
Validation loss = 0.2588029205799103
Validation loss = 0.23507733643054962
Validation loss = 0.23931710422039032
Validation loss = 0.24259823560714722
Validation loss = 0.24440816044807434
Validation loss = 0.23941591382026672
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2980930805206299
Validation loss = 0.2432815432548523
Validation loss = 0.2712000012397766
Validation loss = 0.25135672092437744
Validation loss = 0.25057750940322876
Validation loss = 0.2516123056411743
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.29277628660202026
Validation loss = 0.2477767914533615
Validation loss = 0.25035610795021057
Validation loss = 0.237540602684021
Validation loss = 0.23501437902450562
Validation loss = 0.24464264512062073
Validation loss = 0.23271319270133972
Validation loss = 0.24252037703990936
Validation loss = 0.22998082637786865
Validation loss = 0.22867976129055023
Validation loss = 0.22980459034442902
Validation loss = 0.2344435304403305
Validation loss = 0.23549753427505493
Validation loss = 0.2337864637374878
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2878454327583313
Validation loss = 0.25145915150642395
Validation loss = 0.2481667548418045
Validation loss = 0.24292001128196716
Validation loss = 0.24508288502693176
Validation loss = 0.24529634416103363
Validation loss = 0.23866432905197144
Validation loss = 0.23951107263565063
Validation loss = 0.24133628606796265
Validation loss = 0.24216514825820923
Validation loss = 0.23370948433876038
Validation loss = 0.240814208984375
Validation loss = 0.23944181203842163
Validation loss = 0.23764196038246155
Validation loss = 0.23930391669273376
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.16e+03 |
| Iteration     | 1         |
| MaximumReturn | -897      |
| MinimumReturn | -1.57e+03 |
| TotalSamples  | 12000     |
-----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.340280681848526
Validation loss = 0.30087101459503174
Validation loss = 0.3061679005622864
Validation loss = 0.2945314645767212
Validation loss = 0.2994117736816406
Validation loss = 0.2811371088027954
Validation loss = 0.28412535786628723
Validation loss = 0.285060852766037
Validation loss = 0.28618189692497253
Validation loss = 0.29411232471466064
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.34642326831817627
Validation loss = 0.30665919184684753
Validation loss = 0.3102896511554718
Validation loss = 0.2984701693058014
Validation loss = 0.307655930519104
Validation loss = 0.284704327583313
Validation loss = 0.2889202833175659
Validation loss = 0.29095423221588135
Validation loss = 0.2985450029373169
Validation loss = 0.30446627736091614
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3213941752910614
Validation loss = 0.2961953580379486
Validation loss = 0.2924512028694153
Validation loss = 0.30422908067703247
Validation loss = 0.3019809424877167
Validation loss = 0.2897484600543976
Validation loss = 0.28919240832328796
Validation loss = 0.2956134080886841
Validation loss = 0.2817118763923645
Validation loss = 0.2791329324245453
Validation loss = 0.2882162034511566
Validation loss = 0.29171448945999146
Validation loss = 0.28944945335388184
Validation loss = 0.2913741171360016
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3295075595378876
Validation loss = 0.2836058437824249
Validation loss = 0.27858760952949524
Validation loss = 0.2828892171382904
Validation loss = 0.2694159746170044
Validation loss = 0.28786757588386536
Validation loss = 0.27598732709884644
Validation loss = 0.2833566665649414
Validation loss = 0.28440019488334656
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3432137966156006
Validation loss = 0.29703593254089355
Validation loss = 0.2859501838684082
Validation loss = 0.2987356185913086
Validation loss = 0.27962353825569153
Validation loss = 0.2804029881954193
Validation loss = 0.29059019684791565
Validation loss = 0.28500062227249146
Validation loss = 0.28512343764305115
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.81e+03 |
| Iteration     | 2         |
| MaximumReturn | -1.58e+03 |
| MinimumReturn | -2.08e+03 |
| TotalSamples  | 16000     |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.33741772174835205
Validation loss = 0.3144323229789734
Validation loss = 0.307648241519928
Validation loss = 0.32531052827835083
Validation loss = 0.3167891502380371
Validation loss = 0.31112217903137207
Validation loss = 0.3222298324108124
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3510913848876953
Validation loss = 0.34440386295318604
Validation loss = 0.3378969430923462
Validation loss = 0.3365813195705414
Validation loss = 0.3205265402793884
Validation loss = 0.32321059703826904
Validation loss = 0.3143647611141205
Validation loss = 0.32031679153442383
Validation loss = 0.3166479468345642
Validation loss = 0.3160911202430725
Validation loss = 0.3178303837776184
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3755897283554077
Validation loss = 0.325040340423584
Validation loss = 0.3299853205680847
Validation loss = 0.3084653317928314
Validation loss = 0.3187468647956848
Validation loss = 0.31904950737953186
Validation loss = 0.31389009952545166
Validation loss = 0.3175129294395447
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.36100131273269653
Validation loss = 0.3190605640411377
Validation loss = 0.313357949256897
Validation loss = 0.3010779619216919
Validation loss = 0.30539458990097046
Validation loss = 0.3110044300556183
Validation loss = 0.32716768980026245
Validation loss = 0.3122107982635498
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.36193132400512695
Validation loss = 0.30588018894195557
Validation loss = 0.32420510053634644
Validation loss = 0.33408498764038086
Validation loss = 0.32861003279685974
Validation loss = 0.316533625125885
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.07e+03 |
| Iteration     | 3         |
| MaximumReturn | -655      |
| MinimumReturn | -1.54e+03 |
| TotalSamples  | 20000     |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3205171227455139
Validation loss = 0.2959471642971039
Validation loss = 0.2868911027908325
Validation loss = 0.2734052836894989
Validation loss = 0.2744608521461487
Validation loss = 0.2754081189632416
Validation loss = 0.26699602603912354
Validation loss = 0.27230438590049744
Validation loss = 0.2676423192024231
Validation loss = 0.2742154002189636
Validation loss = 0.269521027803421
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.309154212474823
Validation loss = 0.2883278727531433
Validation loss = 0.2773624062538147
Validation loss = 0.2700088620185852
Validation loss = 0.27978962659835815
Validation loss = 0.2676846981048584
Validation loss = 0.2652905583381653
Validation loss = 0.2653191089630127
Validation loss = 0.2624073624610901
Validation loss = 0.26365408301353455
Validation loss = 0.25968843698501587
Validation loss = 0.26181432604789734
Validation loss = 0.26942262053489685
Validation loss = 0.2573075294494629
Validation loss = 0.26970386505126953
Validation loss = 0.2620115876197815
Validation loss = 0.26609277725219727
Validation loss = 0.2745647430419922
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3127804398536682
Validation loss = 0.29145482182502747
Validation loss = 0.28181877732276917
Validation loss = 0.2876613736152649
Validation loss = 0.27145835757255554
Validation loss = 0.266701877117157
Validation loss = 0.271096408367157
Validation loss = 0.2702232003211975
Validation loss = 0.27301663160324097
Validation loss = 0.2729644477367401
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.30212923884391785
Validation loss = 0.2773817181587219
Validation loss = 0.27480942010879517
Validation loss = 0.2756463587284088
Validation loss = 0.2735275626182556
Validation loss = 0.26793065667152405
Validation loss = 0.2658175528049469
Validation loss = 0.2734396457672119
Validation loss = 0.2675551772117615
Validation loss = 0.26540902256965637
Validation loss = 0.2683001756668091
Validation loss = 0.2799014449119568
Validation loss = 0.2678731083869934
Validation loss = 0.2606908082962036
Validation loss = 0.26456648111343384
Validation loss = 0.26900675892829895
Validation loss = 0.25526905059814453
Validation loss = 0.25642216205596924
Validation loss = 0.25965967774391174
Validation loss = 0.2847687304019928
Validation loss = 0.2719137668609619
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3038354516029358
Validation loss = 0.2884852886199951
Validation loss = 0.28036361932754517
Validation loss = 0.2694287896156311
Validation loss = 0.27940747141838074
Validation loss = 0.27307119965553284
Validation loss = 0.2651686370372772
Validation loss = 0.27316421270370483
Validation loss = 0.26842695474624634
Validation loss = 0.27760225534439087
Validation loss = 0.27207669615745544
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.18e+03 |
| Iteration     | 4         |
| MaximumReturn | -748      |
| MinimumReturn | -2.06e+03 |
| TotalSamples  | 24000     |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.28285184502601624
Validation loss = 0.24106289446353912
Validation loss = 0.22259850800037384
Validation loss = 0.22349055111408234
Validation loss = 0.21900759637355804
Validation loss = 0.21750366687774658
Validation loss = 0.21480752527713776
Validation loss = 0.21651673316955566
Validation loss = 0.21362508833408356
Validation loss = 0.22156573832035065
Validation loss = 0.2126852124929428
Validation loss = 0.2091149538755417
Validation loss = 0.21962670981884003
Validation loss = 0.21220386028289795
Validation loss = 0.20997174084186554
Validation loss = 0.21189923584461212
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2594446837902069
Validation loss = 0.24191875755786896
Validation loss = 0.22058458626270294
Validation loss = 0.2223479300737381
Validation loss = 0.22084181010723114
Validation loss = 0.21400631964206696
Validation loss = 0.21219980716705322
Validation loss = 0.21513523161411285
Validation loss = 0.20973187685012817
Validation loss = 0.21144746243953705
Validation loss = 0.21309803426265717
Validation loss = 0.20958739519119263
Validation loss = 0.20939765870571136
Validation loss = 0.2294779270887375
Validation loss = 0.22198450565338135
Validation loss = 0.21114827692508698
Validation loss = 0.20436926186084747
Validation loss = 0.20888179540634155
Validation loss = 0.20870645344257355
Validation loss = 0.2154078483581543
Validation loss = 0.20885877311229706
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.28103378415107727
Validation loss = 0.252603679895401
Validation loss = 0.23087209463119507
Validation loss = 0.22555126249790192
Validation loss = 0.22616060078144073
Validation loss = 0.22115850448608398
Validation loss = 0.22633562982082367
Validation loss = 0.22241586446762085
Validation loss = 0.21931679546833038
Validation loss = 0.21880251169204712
Validation loss = 0.2204112559556961
Validation loss = 0.21351253986358643
Validation loss = 0.21677815914154053
Validation loss = 0.2137930542230606
Validation loss = 0.21774137020111084
Validation loss = 0.22285425662994385
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2472458928823471
Validation loss = 0.23030835390090942
Validation loss = 0.22060155868530273
Validation loss = 0.21006576716899872
Validation loss = 0.2114727944135666
Validation loss = 0.20552225410938263
Validation loss = 0.21167300641536713
Validation loss = 0.20704598724842072
Validation loss = 0.2044955939054489
Validation loss = 0.20535004138946533
Validation loss = 0.21246470510959625
Validation loss = 0.2163156270980835
Validation loss = 0.2143048793077469
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.25957921147346497
Validation loss = 0.2371245175600052
Validation loss = 0.22998185455799103
Validation loss = 0.22369593381881714
Validation loss = 0.22078140079975128
Validation loss = 0.22409822046756744
Validation loss = 0.2230258584022522
Validation loss = 0.22598212957382202
Validation loss = 0.21970580518245697
Validation loss = 0.21686719357967377
Validation loss = 0.21982967853546143
Validation loss = 0.22240705788135529
Validation loss = 0.2159152776002884
Validation loss = 0.2124018669128418
Validation loss = 0.21580375730991364
Validation loss = 0.2154577374458313
Validation loss = 0.21667009592056274
Validation loss = 0.2167549878358841
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -849      |
| Iteration     | 5         |
| MaximumReturn | -587      |
| MinimumReturn | -1.22e+03 |
| TotalSamples  | 28000     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.22449257969856262
Validation loss = 0.2071761041879654
Validation loss = 0.1959546059370041
Validation loss = 0.1872459352016449
Validation loss = 0.18535242974758148
Validation loss = 0.18829569220542908
Validation loss = 0.19311070442199707
Validation loss = 0.18718178570270538
Validation loss = 0.2009207159280777
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.21252810955047607
Validation loss = 0.2015993297100067
Validation loss = 0.19463565945625305
Validation loss = 0.18549013137817383
Validation loss = 0.18585778772830963
Validation loss = 0.1884760558605194
Validation loss = 0.18522553145885468
Validation loss = 0.19741366803646088
Validation loss = 0.18713487684726715
Validation loss = 0.18247242271900177
Validation loss = 0.18054132163524628
Validation loss = 0.18997344374656677
Validation loss = 0.1838565170764923
Validation loss = 0.18889117240905762
Validation loss = 0.18435396254062653
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2341533601284027
Validation loss = 0.20434865355491638
Validation loss = 0.1997995376586914
Validation loss = 0.19432716071605682
Validation loss = 0.19528111815452576
Validation loss = 0.19298601150512695
Validation loss = 0.19594024121761322
Validation loss = 0.19355309009552002
Validation loss = 0.20614464581012726
Validation loss = 0.18978391587734222
Validation loss = 0.19449248909950256
Validation loss = 0.19048412144184113
Validation loss = 0.19224858283996582
Validation loss = 0.19365181028842926
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.237479105591774
Validation loss = 0.20038536190986633
Validation loss = 0.18509450554847717
Validation loss = 0.18563899397850037
Validation loss = 0.18749549984931946
Validation loss = 0.18317262828350067
Validation loss = 0.18807287514209747
Validation loss = 0.18920965492725372
Validation loss = 0.18318486213684082
Validation loss = 0.17945219576358795
Validation loss = 0.1792062371969223
Validation loss = 0.18496976792812347
Validation loss = 0.18688149750232697
Validation loss = 0.1817971020936966
Validation loss = 0.18149986863136292
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.23575319349765778
Validation loss = 0.20569536089897156
Validation loss = 0.2048267275094986
Validation loss = 0.19371020793914795
Validation loss = 0.19852252304553986
Validation loss = 0.1926426738500595
Validation loss = 0.18993590772151947
Validation loss = 0.19355250895023346
Validation loss = 0.19294913113117218
Validation loss = 0.18601782619953156
Validation loss = 0.19545194506645203
Validation loss = 0.20367956161499023
Validation loss = 0.18817341327667236
Validation loss = 0.18219546973705292
Validation loss = 0.18360449373722076
Validation loss = 0.18532419204711914
Validation loss = 0.18870854377746582
Validation loss = 0.19269047677516937
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -759      |
| Iteration     | 6         |
| MaximumReturn | -227      |
| MinimumReturn | -1.27e+03 |
| TotalSamples  | 32000     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1886117160320282
Validation loss = 0.16373056173324585
Validation loss = 0.1550397425889969
Validation loss = 0.15357676148414612
Validation loss = 0.15433339774608612
Validation loss = 0.15555423498153687
Validation loss = 0.15308877825737
Validation loss = 0.15730372071266174
Validation loss = 0.1546243280172348
Validation loss = 0.1513831913471222
Validation loss = 0.147284597158432
Validation loss = 0.1484125256538391
Validation loss = 0.14877024292945862
Validation loss = 0.17683526873588562
Validation loss = 0.15183168649673462
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.20157259702682495
Validation loss = 0.1598818153142929
Validation loss = 0.1530379056930542
Validation loss = 0.1520330160856247
Validation loss = 0.1518828570842743
Validation loss = 0.15503033995628357
Validation loss = 0.1531260907649994
Validation loss = 0.16167718172073364
Validation loss = 0.15419910848140717
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1894512176513672
Validation loss = 0.16154441237449646
Validation loss = 0.15988191962242126
Validation loss = 0.1583159863948822
Validation loss = 0.1571471095085144
Validation loss = 0.15749219059944153
Validation loss = 0.15722668170928955
Validation loss = 0.1604783535003662
Validation loss = 0.15832194685935974
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.19716301560401917
Validation loss = 0.16831167042255402
Validation loss = 0.15500372648239136
Validation loss = 0.14993053674697876
Validation loss = 0.15264368057250977
Validation loss = 0.15428222715854645
Validation loss = 0.15503817796707153
Validation loss = 0.17034855484962463
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.18921604752540588
Validation loss = 0.16788791120052338
Validation loss = 0.1570468693971634
Validation loss = 0.15591707825660706
Validation loss = 0.15087521076202393
Validation loss = 0.15215660631656647
Validation loss = 0.15702149271965027
Validation loss = 0.15056449174880981
Validation loss = 0.1486792266368866
Validation loss = 0.15420466661453247
Validation loss = 0.15587173402309418
Validation loss = 0.169562429189682
Validation loss = 0.14505207538604736
Validation loss = 0.14453336596488953
Validation loss = 0.1454000174999237
Validation loss = 0.1498415470123291
Validation loss = 0.1526254266500473
Validation loss = 0.15036702156066895
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.33e+03 |
| Iteration     | 7         |
| MaximumReturn | -819      |
| MinimumReturn | -1.96e+03 |
| TotalSamples  | 36000     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1573154777288437
Validation loss = 0.14452539384365082
Validation loss = 0.13775783777236938
Validation loss = 0.13405458629131317
Validation loss = 0.13384543359279633
Validation loss = 0.1393980085849762
Validation loss = 0.14478494226932526
Validation loss = 0.133486807346344
Validation loss = 0.13430751860141754
Validation loss = 0.1341952532529831
Validation loss = 0.1374218612909317
Validation loss = 0.13937778770923615
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17970241606235504
Validation loss = 0.14968550205230713
Validation loss = 0.13580137491226196
Validation loss = 0.1398782730102539
Validation loss = 0.14007051289081573
Validation loss = 0.14154274761676788
Validation loss = 0.1385798156261444
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18695761263370514
Validation loss = 0.1511523425579071
Validation loss = 0.1418057382106781
Validation loss = 0.13943755626678467
Validation loss = 0.13737601041793823
Validation loss = 0.1401773989200592
Validation loss = 0.1395007073879242
Validation loss = 0.1488259881734848
Validation loss = 0.14137998223304749
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17659218609333038
Validation loss = 0.14210361242294312
Validation loss = 0.14116396009922028
Validation loss = 0.1395382285118103
Validation loss = 0.14015011489391327
Validation loss = 0.13952505588531494
Validation loss = 0.1449183225631714
Validation loss = 0.14021366834640503
Validation loss = 0.1380772888660431
Validation loss = 0.13891245424747467
Validation loss = 0.13647614419460297
Validation loss = 0.1389579027891159
Validation loss = 0.13626393675804138
Validation loss = 0.14203190803527832
Validation loss = 0.13845416903495789
Validation loss = 0.13205896317958832
Validation loss = 0.1311689019203186
Validation loss = 0.13944901525974274
Validation loss = 0.14539730548858643
Validation loss = 0.1313353329896927
Validation loss = 0.1314111053943634
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15219229459762573
Validation loss = 0.14272426068782806
Validation loss = 0.13720257580280304
Validation loss = 0.13459916412830353
Validation loss = 0.14547504484653473
Validation loss = 0.14733973145484924
Validation loss = 0.13244079053401947
Validation loss = 0.1326538622379303
Validation loss = 0.13743820786476135
Validation loss = 0.13828100264072418
Validation loss = 0.14531564712524414
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.14e+03 |
| Iteration     | 8         |
| MaximumReturn | -648      |
| MinimumReturn | -1.67e+03 |
| TotalSamples  | 40000     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16497786343097687
Validation loss = 0.13073588907718658
Validation loss = 0.12195255607366562
Validation loss = 0.11994719505310059
Validation loss = 0.12232279777526855
Validation loss = 0.12498140335083008
Validation loss = 0.12680354714393616
Validation loss = 0.121490977704525
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15518532693386078
Validation loss = 0.13599586486816406
Validation loss = 0.12503041326999664
Validation loss = 0.12411288917064667
Validation loss = 0.12620551884174347
Validation loss = 0.12800389528274536
Validation loss = 0.12802039086818695
Validation loss = 0.12479710578918457
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16080468893051147
Validation loss = 0.13599368929862976
Validation loss = 0.1252577155828476
Validation loss = 0.12579670548439026
Validation loss = 0.13311144709587097
Validation loss = 0.13291263580322266
Validation loss = 0.12503135204315186
Validation loss = 0.12226774543523788
Validation loss = 0.12582874298095703
Validation loss = 0.13343802094459534
Validation loss = 0.13251163065433502
Validation loss = 0.12305627763271332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15188433229923248
Validation loss = 0.13043203949928284
Validation loss = 0.12326208502054214
Validation loss = 0.12137885391712189
Validation loss = 0.12595148384571075
Validation loss = 0.1290590763092041
Validation loss = 0.1244722455739975
Validation loss = 0.1312529295682907
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.16856810450553894
Validation loss = 0.13600805401802063
Validation loss = 0.1267496645450592
Validation loss = 0.12375857681035995
Validation loss = 0.12119720876216888
Validation loss = 0.1253863424062729
Validation loss = 0.1276918351650238
Validation loss = 0.12150119245052338
Validation loss = 0.12115494906902313
Validation loss = 0.12385692447423935
Validation loss = 0.12239569425582886
Validation loss = 0.14859271049499512
Validation loss = 0.12350956350564957
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -995      |
| Iteration     | 9         |
| MaximumReturn | -690      |
| MinimumReturn | -1.49e+03 |
| TotalSamples  | 44000     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.136863112449646
Validation loss = 0.12019383907318115
Validation loss = 0.11646664142608643
Validation loss = 0.1134936586022377
Validation loss = 0.11783089488744736
Validation loss = 0.11502275615930557
Validation loss = 0.12042121589183807
Validation loss = 0.12395298480987549
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1447828859090805
Validation loss = 0.12274566292762756
Validation loss = 0.11927513778209686
Validation loss = 0.11793279647827148
Validation loss = 0.12018296867609024
Validation loss = 0.12153582274913788
Validation loss = 0.11897476017475128
Validation loss = 0.11839627474546432
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14807020127773285
Validation loss = 0.12532292306423187
Validation loss = 0.11819617450237274
Validation loss = 0.11771169304847717
Validation loss = 0.11918426305055618
Validation loss = 0.13010311126708984
Validation loss = 0.11933727562427521
Validation loss = 0.11794617772102356
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1484537571668625
Validation loss = 0.12355201691389084
Validation loss = 0.11760669201612473
Validation loss = 0.11807715147733688
Validation loss = 0.11980116367340088
Validation loss = 0.11869370937347412
Validation loss = 0.1299675554037094
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14045977592468262
Validation loss = 0.11817234009504318
Validation loss = 0.11683832854032516
Validation loss = 0.1179419755935669
Validation loss = 0.1164085865020752
Validation loss = 0.11706531047821045
Validation loss = 0.11971145123243332
Validation loss = 0.11730542778968811
Validation loss = 0.11334746330976486
Validation loss = 0.11328849196434021
Validation loss = 0.11779535561800003
Validation loss = 0.12103994190692902
Validation loss = 0.11472491919994354
Validation loss = 0.11081163585186005
Validation loss = 0.11195509135723114
Validation loss = 0.11950850486755371
Validation loss = 0.11024706810712814
Validation loss = 0.11039631068706512
Validation loss = 0.11434078961610794
Validation loss = 0.11471950262784958
Validation loss = 0.1155402734875679
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -916      |
| Iteration     | 10        |
| MaximumReturn | -426      |
| MinimumReturn | -1.24e+03 |
| TotalSamples  | 48000     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1270643174648285
Validation loss = 0.11028659343719482
Validation loss = 0.11054036021232605
Validation loss = 0.10653992742300034
Validation loss = 0.11231663078069687
Validation loss = 0.11333039402961731
Validation loss = 0.1089622750878334
Validation loss = 0.10699886083602905
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1363707035779953
Validation loss = 0.11870362609624863
Validation loss = 0.11233941465616226
Validation loss = 0.11306475847959518
Validation loss = 0.12284388393163681
Validation loss = 0.11340343207120895
Validation loss = 0.11008340865373611
Validation loss = 0.11234315484762192
Validation loss = 0.11290683597326279
Validation loss = 0.11438413709402084
Validation loss = 0.11088653653860092
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1301417201757431
Validation loss = 0.11835408210754395
Validation loss = 0.1121271625161171
Validation loss = 0.11044666916131973
Validation loss = 0.11383555084466934
Validation loss = 0.12021747976541519
Validation loss = 0.11138173192739487
Validation loss = 0.11002359539270401
Validation loss = 0.10999846458435059
Validation loss = 0.11289775371551514
Validation loss = 0.11038092523813248
Validation loss = 0.10714324563741684
Validation loss = 0.12344314903020859
Validation loss = 0.11201228946447372
Validation loss = 0.10490292310714722
Validation loss = 0.10598161816596985
Validation loss = 0.11378708481788635
Validation loss = 0.10602619498968124
Validation loss = 0.10601893812417984
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1263100951910019
Validation loss = 0.11166711896657944
Validation loss = 0.10926453024148941
Validation loss = 0.109133780002594
Validation loss = 0.11187412589788437
Validation loss = 0.1114906445145607
Validation loss = 0.10962975025177002
Validation loss = 0.11125566810369492
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12936992943286896
Validation loss = 0.10905162245035172
Validation loss = 0.10354699939489365
Validation loss = 0.1038902997970581
Validation loss = 0.11098221689462662
Validation loss = 0.10873230546712875
Validation loss = 0.10448276996612549
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.04e+03 |
| Iteration     | 11        |
| MaximumReturn | -380      |
| MinimumReturn | -1.46e+03 |
| TotalSamples  | 52000     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12632964551448822
Validation loss = 0.10646326839923859
Validation loss = 0.10449227690696716
Validation loss = 0.10629626363515854
Validation loss = 0.11233872920274734
Validation loss = 0.1033187285065651
Validation loss = 0.10250002890825272
Validation loss = 0.10923434048891068
Validation loss = 0.10503354668617249
Validation loss = 0.10090024769306183
Validation loss = 0.10220536589622498
Validation loss = 0.10957468301057816
Validation loss = 0.10794064402580261
Validation loss = 0.10217451304197311
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12461014091968536
Validation loss = 0.11070433259010315
Validation loss = 0.106071837246418
Validation loss = 0.10624350607395172
Validation loss = 0.10995413362979889
Validation loss = 0.10942593216896057
Validation loss = 0.1083081066608429
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11698481440544128
Validation loss = 0.11076827347278595
Validation loss = 0.10403989255428314
Validation loss = 0.10846400260925293
Validation loss = 0.10783609002828598
Validation loss = 0.10381612181663513
Validation loss = 0.10424382239580154
Validation loss = 0.10987304896116257
Validation loss = 0.10607035458087921
Validation loss = 0.09909921139478683
Validation loss = 0.10234064608812332
Validation loss = 0.10877067595720291
Validation loss = 0.1036725863814354
Validation loss = 0.10240736603736877
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1253637969493866
Validation loss = 0.10919174551963806
Validation loss = 0.10759540647268295
Validation loss = 0.10949787497520447
Validation loss = 0.10857553780078888
Validation loss = 0.10839292407035828
Validation loss = 0.11566715687513351
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11749554425477982
Validation loss = 0.10922189056873322
Validation loss = 0.1020992249250412
Validation loss = 0.1023794412612915
Validation loss = 0.10781370848417282
Validation loss = 0.10646123439073563
Validation loss = 0.10479835420846939
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.23e+03 |
| Iteration     | 12        |
| MaximumReturn | -826      |
| MinimumReturn | -1.53e+03 |
| TotalSamples  | 56000     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10599996149539948
Validation loss = 0.10192321240901947
Validation loss = 0.09604434669017792
Validation loss = 0.09650664776563644
Validation loss = 0.10239146649837494
Validation loss = 0.10156486928462982
Validation loss = 0.09602884948253632
Validation loss = 0.09837665408849716
Validation loss = 0.09973545372486115
Validation loss = 0.10481171309947968
Validation loss = 0.09527432918548584
Validation loss = 0.09581723064184189
Validation loss = 0.10593356937170029
Validation loss = 0.09528420865535736
Validation loss = 0.0930267795920372
Validation loss = 0.10132187604904175
Validation loss = 0.09827258437871933
Validation loss = 0.09280438721179962
Validation loss = 0.09442634880542755
Validation loss = 0.1010320708155632
Validation loss = 0.09661733359098434
Validation loss = 0.09125202894210815
Validation loss = 0.09310735017061234
Validation loss = 0.09906996041536331
Validation loss = 0.08944673091173172
Validation loss = 0.09013967216014862
Validation loss = 0.09834779798984528
Validation loss = 0.09765218943357468
Validation loss = 0.08964811265468597
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11104445904493332
Validation loss = 0.10507284849882126
Validation loss = 0.10759478807449341
Validation loss = 0.0995921641588211
Validation loss = 0.1008523553609848
Validation loss = 0.10276082903146744
Validation loss = 0.10845471918582916
Validation loss = 0.09893451631069183
Validation loss = 0.09826960414648056
Validation loss = 0.09968169033527374
Validation loss = 0.10552388429641724
Validation loss = 0.10084874927997589
Validation loss = 0.09593621641397476
Validation loss = 0.0963447093963623
Validation loss = 0.1057114377617836
Validation loss = 0.09885774552822113
Validation loss = 0.09445371478796005
Validation loss = 0.0964915007352829
Validation loss = 0.10287374258041382
Validation loss = 0.09265565127134323
Validation loss = 0.09508894383907318
Validation loss = 0.10871415585279465
Validation loss = 0.09328552335500717
Validation loss = 0.09295934438705444
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10335821658372879
Validation loss = 0.09884386509656906
Validation loss = 0.09818132221698761
Validation loss = 0.10276239365339279
Validation loss = 0.09837762266397476
Validation loss = 0.10865337401628494
Validation loss = 0.09526883810758591
Validation loss = 0.09427381306886673
Validation loss = 0.10058563947677612
Validation loss = 0.09999506920576096
Validation loss = 0.09441157430410385
Validation loss = 0.09810542315244675
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12158061563968658
Validation loss = 0.11005336791276932
Validation loss = 0.09968655556440353
Validation loss = 0.1013360470533371
Validation loss = 0.10402057319879532
Validation loss = 0.09984828531742096
Validation loss = 0.0987125113606453
Validation loss = 0.10024205595254898
Validation loss = 0.09982604533433914
Validation loss = 0.09920968115329742
Validation loss = 0.10036319494247437
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11017151921987534
Validation loss = 0.09892000257968903
Validation loss = 0.09657347947359085
Validation loss = 0.09896500408649445
Validation loss = 0.10495065897703171
Validation loss = 0.09793256968259811
Validation loss = 0.09575839340686798
Validation loss = 0.10008954256772995
Validation loss = 0.09901709109544754
Validation loss = 0.09801729768514633
Validation loss = 0.10667186230421066
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -983     |
| Iteration     | 13       |
| MaximumReturn | -175     |
| MinimumReturn | -1.5e+03 |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10841591656208038
Validation loss = 0.0888742059469223
Validation loss = 0.08469577133655548
Validation loss = 0.08728652447462082
Validation loss = 0.0868382528424263
Validation loss = 0.08902621269226074
Validation loss = 0.08501355350017548
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10515987128019333
Validation loss = 0.09091980010271072
Validation loss = 0.08840121328830719
Validation loss = 0.0930943563580513
Validation loss = 0.09099549800157547
Validation loss = 0.08965568989515305
Validation loss = 0.08915016800165176
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11556507647037506
Validation loss = 0.09361442178487778
Validation loss = 0.08776242285966873
Validation loss = 0.08980607986450195
Validation loss = 0.09562241286039352
Validation loss = 0.08962665498256683
Validation loss = 0.08747527003288269
Validation loss = 0.08992672711610794
Validation loss = 0.09022893011569977
Validation loss = 0.08687549084424973
Validation loss = 0.09657038003206253
Validation loss = 0.09033747017383575
Validation loss = 0.08782429993152618
Validation loss = 0.08765529096126556
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10995820164680481
Validation loss = 0.09460940957069397
Validation loss = 0.09171869605779648
Validation loss = 0.09125401079654694
Validation loss = 0.09620445221662521
Validation loss = 0.09407161176204681
Validation loss = 0.09317287802696228
Validation loss = 0.09438192844390869
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10867049545049667
Validation loss = 0.0913962870836258
Validation loss = 0.08931790292263031
Validation loss = 0.09046699851751328
Validation loss = 0.10087736696004868
Validation loss = 0.09030342102050781
Validation loss = 0.08822974562644958
Validation loss = 0.09654534608125687
Validation loss = 0.0963807925581932
Validation loss = 0.08814080059528351
Validation loss = 0.08839606493711472
Validation loss = 0.09407220035791397
Validation loss = 0.09563564509153366
Validation loss = 0.0870744064450264
Validation loss = 0.09158025681972504
Validation loss = 0.0905451774597168
Validation loss = 0.08545874059200287
Validation loss = 0.08687567710876465
Validation loss = 0.08954676985740662
Validation loss = 0.08872676640748978
Validation loss = 0.08383620530366898
Validation loss = 0.08612377941608429
Validation loss = 0.0965602919459343
Validation loss = 0.08670282363891602
Validation loss = 0.08443015068769455
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -807      |
| Iteration     | 14        |
| MaximumReturn | -542      |
| MinimumReturn | -1.03e+03 |
| TotalSamples  | 64000     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09721484035253525
Validation loss = 0.08175361901521683
Validation loss = 0.07920581847429276
Validation loss = 0.079157255589962
Validation loss = 0.08759906888008118
Validation loss = 0.07646725326776505
Validation loss = 0.0803767591714859
Validation loss = 0.08148923516273499
Validation loss = 0.08389513194561005
Validation loss = 0.07638774812221527
Validation loss = 0.07761187851428986
Validation loss = 0.08334265649318695
Validation loss = 0.08404020965099335
Validation loss = 0.07826735079288483
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09753948450088501
Validation loss = 0.08407288789749146
Validation loss = 0.08492475748062134
Validation loss = 0.08156441152095795
Validation loss = 0.0868975892663002
Validation loss = 0.08583439886569977
Validation loss = 0.08216279745101929
Validation loss = 0.08570808172225952
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10754818469285965
Validation loss = 0.08418251574039459
Validation loss = 0.08191536366939545
Validation loss = 0.08609674870967865
Validation loss = 0.08674966543912888
Validation loss = 0.08225351572036743
Validation loss = 0.08094635605812073
Validation loss = 0.08964727818965912
Validation loss = 0.08066296577453613
Validation loss = 0.08110176771879196
Validation loss = 0.09022526443004608
Validation loss = 0.08107513934373856
Validation loss = 0.07906175404787064
Validation loss = 0.08421364426612854
Validation loss = 0.08323033154010773
Validation loss = 0.0795120969414711
Validation loss = 0.07818488776683807
Validation loss = 0.08133536577224731
Validation loss = 0.08858171105384827
Validation loss = 0.07804079353809357
Validation loss = 0.07785943895578384
Validation loss = 0.08292293548583984
Validation loss = 0.08188438415527344
Validation loss = 0.07799940556287766
Validation loss = 0.08190328627824783
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10446827113628387
Validation loss = 0.08884882926940918
Validation loss = 0.08628349006175995
Validation loss = 0.08717404305934906
Validation loss = 0.09569093585014343
Validation loss = 0.0866880938410759
Validation loss = 0.08505374193191528
Validation loss = 0.09251868724822998
Validation loss = 0.08546297252178192
Validation loss = 0.08414146304130554
Validation loss = 0.08748038858175278
Validation loss = 0.092134028673172
Validation loss = 0.08392702043056488
Validation loss = 0.08241775631904602
Validation loss = 0.09185454994440079
Validation loss = 0.08670596778392792
Validation loss = 0.08065733313560486
Validation loss = 0.08403265476226807
Validation loss = 0.0888962373137474
Validation loss = 0.08461624383926392
Validation loss = 0.08189426362514496
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09251212328672409
Validation loss = 0.08123654127120972
Validation loss = 0.08070079237222672
Validation loss = 0.08105762302875519
Validation loss = 0.08184075355529785
Validation loss = 0.0827595591545105
Validation loss = 0.08109166473150253
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -920     |
| Iteration     | 15       |
| MaximumReturn | -176     |
| MinimumReturn | -1.6e+03 |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09103796631097794
Validation loss = 0.07796929776668549
Validation loss = 0.07605141401290894
Validation loss = 0.07795059680938721
Validation loss = 0.07591764628887177
Validation loss = 0.07813050597906113
Validation loss = 0.0754675641655922
Validation loss = 0.08417124301195145
Validation loss = 0.07465347647666931
Validation loss = 0.0726286917924881
Validation loss = 0.07630978524684906
Validation loss = 0.08233606815338135
Validation loss = 0.07326770573854446
Validation loss = 0.07158863544464111
Validation loss = 0.08123749494552612
Validation loss = 0.07429777085781097
Validation loss = 0.07145775854587555
Validation loss = 0.07434305548667908
Validation loss = 0.0784527063369751
Validation loss = 0.0737437978386879
Validation loss = 0.0715542659163475
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09950133413076401
Validation loss = 0.0821155458688736
Validation loss = 0.07832594960927963
Validation loss = 0.0814598873257637
Validation loss = 0.08100272715091705
Validation loss = 0.08139869570732117
Validation loss = 0.08004453033208847
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09951186925172806
Validation loss = 0.07753187417984009
Validation loss = 0.07478760927915573
Validation loss = 0.07709375023841858
Validation loss = 0.07446260005235672
Validation loss = 0.07880335301160812
Validation loss = 0.07566200196743011
Validation loss = 0.07784801721572876
Validation loss = 0.07541494816541672
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09304535388946533
Validation loss = 0.08541826158761978
Validation loss = 0.08200504630804062
Validation loss = 0.0845281332731247
Validation loss = 0.08605962991714478
Validation loss = 0.0854044035077095
Validation loss = 0.0800047367811203
Validation loss = 0.07942932844161987
Validation loss = 0.0924767404794693
Validation loss = 0.08088741451501846
Validation loss = 0.08034853637218475
Validation loss = 0.08129222691059113
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09438304603099823
Validation loss = 0.07975907623767853
Validation loss = 0.07691018283367157
Validation loss = 0.07765986770391464
Validation loss = 0.07918236404657364
Validation loss = 0.07717367261648178
Validation loss = 0.0791962742805481
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -489     |
| Iteration     | 16       |
| MaximumReturn | 10.1     |
| MinimumReturn | -667     |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07731708884239197
Validation loss = 0.07446783781051636
Validation loss = 0.0707465261220932
Validation loss = 0.06941297650337219
Validation loss = 0.07738589495420456
Validation loss = 0.06928481161594391
Validation loss = 0.06859464943408966
Validation loss = 0.07037466764450073
Validation loss = 0.06657504290342331
Validation loss = 0.07003069669008255
Validation loss = 0.06995797902345657
Validation loss = 0.06915958225727081
Validation loss = 0.0666675716638565
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08816782385110855
Validation loss = 0.07962021976709366
Validation loss = 0.07609399408102036
Validation loss = 0.07752099633216858
Validation loss = 0.07596588134765625
Validation loss = 0.07426503300666809
Validation loss = 0.07448945194482803
Validation loss = 0.07853711396455765
Validation loss = 0.0738157108426094
Validation loss = 0.07445304095745087
Validation loss = 0.07970895618200302
Validation loss = 0.07456568628549576
Validation loss = 0.0772605836391449
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08648190647363663
Validation loss = 0.07955524325370789
Validation loss = 0.07280469685792923
Validation loss = 0.07192499935626984
Validation loss = 0.07935173809528351
Validation loss = 0.0726134404540062
Validation loss = 0.07398152351379395
Validation loss = 0.07413602620363235
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10058963298797607
Validation loss = 0.07762011885643005
Validation loss = 0.08013781905174255
Validation loss = 0.07762742787599564
Validation loss = 0.08002413809299469
Validation loss = 0.07723330706357956
Validation loss = 0.0805516317486763
Validation loss = 0.07581222802400589
Validation loss = 0.07975127547979355
Validation loss = 0.07893557846546173
Validation loss = 0.08040879666805267
Validation loss = 0.07596331089735031
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09415838867425919
Validation loss = 0.07583686709403992
Validation loss = 0.07437245547771454
Validation loss = 0.07517658174037933
Validation loss = 0.07825279235839844
Validation loss = 0.0747196227312088
Validation loss = 0.07478651404380798
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -817      |
| Iteration     | 17        |
| MaximumReturn | 369       |
| MinimumReturn | -1.62e+03 |
| TotalSamples  | 76000     |
-----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08267904818058014
Validation loss = 0.06705630570650101
Validation loss = 0.06576400250196457
Validation loss = 0.06559626758098602
Validation loss = 0.06705977767705917
Validation loss = 0.06456812471151352
Validation loss = 0.07105258852243423
Validation loss = 0.06555614620447159
Validation loss = 0.06392175704240799
Validation loss = 0.0663277879357338
Validation loss = 0.07046690583229065
Validation loss = 0.06483408063650131
Validation loss = 0.06440982967615128
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08565967530012131
Validation loss = 0.07204573601484299
Validation loss = 0.0707101970911026
Validation loss = 0.07013236731290817
Validation loss = 0.0728580430150032
Validation loss = 0.07274463772773743
Validation loss = 0.07205226272344589
Validation loss = 0.07238878309726715
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08518760651350021
Validation loss = 0.0714171752333641
Validation loss = 0.07065488398075104
Validation loss = 0.07729348540306091
Validation loss = 0.07253162562847137
Validation loss = 0.0685940608382225
Validation loss = 0.0715211033821106
Validation loss = 0.07400091737508774
Validation loss = 0.07423001527786255
Validation loss = 0.0692811831831932
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08173786848783493
Validation loss = 0.07600127160549164
Validation loss = 0.07398179173469543
Validation loss = 0.07563971728086472
Validation loss = 0.08258700370788574
Validation loss = 0.07408234477043152
Validation loss = 0.0747627317905426
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08759201318025589
Validation loss = 0.07561591267585754
Validation loss = 0.07310318946838379
Validation loss = 0.07473961263895035
Validation loss = 0.077383853495121
Validation loss = 0.07191985845565796
Validation loss = 0.07683268189430237
Validation loss = 0.0738431066274643
Validation loss = 0.07642078399658203
Validation loss = 0.07583341002464294
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -624      |
| Iteration     | 18        |
| MaximumReturn | 160       |
| MinimumReturn | -1.44e+03 |
| TotalSamples  | 80000     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07552118599414825
Validation loss = 0.06378678977489471
Validation loss = 0.06236035376787186
Validation loss = 0.06502489000558853
Validation loss = 0.06459538638591766
Validation loss = 0.061827145516872406
Validation loss = 0.06456367671489716
Validation loss = 0.06910358369350433
Validation loss = 0.06075853109359741
Validation loss = 0.061409998685121536
Validation loss = 0.0660991445183754
Validation loss = 0.06125332787632942
Validation loss = 0.0624808594584465
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08632899820804596
Validation loss = 0.06935428828001022
Validation loss = 0.06798632442951202
Validation loss = 0.06838376820087433
Validation loss = 0.07297532260417938
Validation loss = 0.06740471720695496
Validation loss = 0.06901659071445465
Validation loss = 0.06948519498109818
Validation loss = 0.06840914487838745
Validation loss = 0.06491781771183014
Validation loss = 0.06642405688762665
Validation loss = 0.07499603927135468
Validation loss = 0.06641553342342377
Validation loss = 0.0665767639875412
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07654926180839539
Validation loss = 0.06894173473119736
Validation loss = 0.06927376985549927
Validation loss = 0.07127205282449722
Validation loss = 0.0711892619729042
Validation loss = 0.06863950192928314
Validation loss = 0.06864369660615921
Validation loss = 0.07032487541437149
Validation loss = 0.0687103122472763
Validation loss = 0.07315345108509064
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09021174907684326
Validation loss = 0.07192616909742355
Validation loss = 0.07237403094768524
Validation loss = 0.07381589710712433
Validation loss = 0.07626466453075409
Validation loss = 0.07234157621860504
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08050477504730225
Validation loss = 0.07151015847921371
Validation loss = 0.07072576880455017
Validation loss = 0.0713692232966423
Validation loss = 0.08426384627819061
Validation loss = 0.06865224242210388
Validation loss = 0.0708741694688797
Validation loss = 0.07262273877859116
Validation loss = 0.07036348432302475
Validation loss = 0.06798873096704483
Validation loss = 0.074337899684906
Validation loss = 0.07028790563344955
Validation loss = 0.06775051355361938
Validation loss = 0.07183770835399628
Validation loss = 0.06942964345216751
Validation loss = 0.06758444011211395
Validation loss = 0.06900627911090851
Validation loss = 0.07709567248821259
Validation loss = 0.0671422928571701
Validation loss = 0.06714563071727753
Validation loss = 0.0728558748960495
Validation loss = 0.0689489096403122
Validation loss = 0.06666327267885208
Validation loss = 0.06911228597164154
Validation loss = 0.07572894543409348
Validation loss = 0.06671072542667389
Validation loss = 0.06634017825126648
Validation loss = 0.06727448850870132
Validation loss = 0.06910955160856247
Validation loss = 0.0673048123717308
Validation loss = 0.06566645205020905
Validation loss = 0.07310513406991959
Validation loss = 0.06631012260913849
Validation loss = 0.06584580987691879
Validation loss = 0.07580208033323288
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -332      |
| Iteration     | 19        |
| MaximumReturn | 279       |
| MinimumReturn | -1.29e+03 |
| TotalSamples  | 84000     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07269056886434555
Validation loss = 0.06014570593833923
Validation loss = 0.05909422039985657
Validation loss = 0.06006699055433273
Validation loss = 0.06494984030723572
Validation loss = 0.059596527367830276
Validation loss = 0.05966714397072792
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07090519368648529
Validation loss = 0.06443680077791214
Validation loss = 0.06413232535123825
Validation loss = 0.066652312874794
Validation loss = 0.06379934400320053
Validation loss = 0.06662432104349136
Validation loss = 0.061825189739465714
Validation loss = 0.06144735962152481
Validation loss = 0.06687601655721664
Validation loss = 0.06683941930532455
Validation loss = 0.061141423881053925
Validation loss = 0.07280498743057251
Validation loss = 0.06093202903866768
Validation loss = 0.06043003126978874
Validation loss = 0.0645734965801239
Validation loss = 0.06360133737325668
Validation loss = 0.060897570103406906
Validation loss = 0.06394366174936295
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07705090939998627
Validation loss = 0.0663960799574852
Validation loss = 0.06468864530324936
Validation loss = 0.06877632439136505
Validation loss = 0.06597590446472168
Validation loss = 0.06557328253984451
Validation loss = 0.06600136309862137
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07731857895851135
Validation loss = 0.07071217894554138
Validation loss = 0.06840304285287857
Validation loss = 0.07068856060504913
Validation loss = 0.07213395088911057
Validation loss = 0.07087255269289017
Validation loss = 0.06946999579668045
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07599115371704102
Validation loss = 0.06378151476383209
Validation loss = 0.06559975445270538
Validation loss = 0.0645299181342125
Validation loss = 0.065471351146698
Validation loss = 0.06275811791419983
Validation loss = 0.06786598265171051
Validation loss = 0.062356527894735336
Validation loss = 0.06139414757490158
Validation loss = 0.0654802918434143
Validation loss = 0.06490994989871979
Validation loss = 0.06083628162741661
Validation loss = 0.06170746684074402
Validation loss = 0.06866927444934845
Validation loss = 0.06124993786215782
Validation loss = 0.06053810566663742
Validation loss = 0.06372489780187607
Validation loss = 0.06183100491762161
Validation loss = 0.061787672340869904
Validation loss = 0.0616852231323719
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -432     |
| Iteration     | 20       |
| MaximumReturn | 48.8     |
| MinimumReturn | -945     |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06923110783100128
Validation loss = 0.06067657098174095
Validation loss = 0.0565435066819191
Validation loss = 0.062455687671899796
Validation loss = 0.05896012857556343
Validation loss = 0.05815255269408226
Validation loss = 0.056483808904886246
Validation loss = 0.05954694747924805
Validation loss = 0.05634875223040581
Validation loss = 0.056306954473257065
Validation loss = 0.06187925487756729
Validation loss = 0.055333200842142105
Validation loss = 0.05514717102050781
Validation loss = 0.05946411192417145
Validation loss = 0.06041298061609268
Validation loss = 0.05549081787467003
Validation loss = 0.057718560099601746
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06965339928865433
Validation loss = 0.05867825448513031
Validation loss = 0.05870953947305679
Validation loss = 0.06209748983383179
Validation loss = 0.05921551212668419
Validation loss = 0.05662987381219864
Validation loss = 0.061279091984033585
Validation loss = 0.060216665267944336
Validation loss = 0.05723185837268829
Validation loss = 0.05982255935668945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07457032799720764
Validation loss = 0.06284360587596893
Validation loss = 0.06247765198349953
Validation loss = 0.06428742408752441
Validation loss = 0.0619189627468586
Validation loss = 0.06107226386666298
Validation loss = 0.06487321853637695
Validation loss = 0.06264942139387131
Validation loss = 0.06295017898082733
Validation loss = 0.0618954636156559
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07557504624128342
Validation loss = 0.06888381391763687
Validation loss = 0.06657964736223221
Validation loss = 0.06928898394107819
Validation loss = 0.0708344355225563
Validation loss = 0.06753916293382645
Validation loss = 0.06602499634027481
Validation loss = 0.07791160047054291
Validation loss = 0.06566660106182098
Validation loss = 0.06442016363143921
Validation loss = 0.07509509474039078
Validation loss = 0.06426023691892624
Validation loss = 0.06432145088911057
Validation loss = 0.07249642163515091
Validation loss = 0.06463025510311127
Validation loss = 0.06324184685945511
Validation loss = 0.06559687852859497
Validation loss = 0.06540516763925552
Validation loss = 0.0646960511803627
Validation loss = 0.06286874413490295
Validation loss = 0.06618296355009079
Validation loss = 0.06759614497423172
Validation loss = 0.0627598837018013
Validation loss = 0.06596438586711884
Validation loss = 0.06560888886451721
Validation loss = 0.0613257922232151
Validation loss = 0.0692700445652008
Validation loss = 0.06431359797716141
Validation loss = 0.06132885441184044
Validation loss = 0.06536641716957092
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06439340859651566
Validation loss = 0.05914777144789696
Validation loss = 0.0589938759803772
Validation loss = 0.06374841183423996
Validation loss = 0.0574629083275795
Validation loss = 0.05870397388935089
Validation loss = 0.06233647093176842
Validation loss = 0.05786709487438202
Validation loss = 0.057321228086948395
Validation loss = 0.06452929228544235
Validation loss = 0.05694343149662018
Validation loss = 0.05661638826131821
Validation loss = 0.0606108158826828
Validation loss = 0.058805689215660095
Validation loss = 0.05531279742717743
Validation loss = 0.06614118814468384
Validation loss = 0.056153926998376846
Validation loss = 0.055366646498441696
Validation loss = 0.06339613348245621
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -53.5    |
| Iteration     | 21       |
| MaximumReturn | 304      |
| MinimumReturn | -414     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0641496405005455
Validation loss = 0.05379251390695572
Validation loss = 0.05321693420410156
Validation loss = 0.05811154842376709
Validation loss = 0.05399176850914955
Validation loss = 0.05400284379720688
Validation loss = 0.0556412972509861
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07130445539951324
Validation loss = 0.05555347725749016
Validation loss = 0.05498728156089783
Validation loss = 0.05707430839538574
Validation loss = 0.05738770589232445
Validation loss = 0.05547042936086655
Validation loss = 0.06066102162003517
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07147034257650375
Validation loss = 0.06101119518280029
Validation loss = 0.058901943266391754
Validation loss = 0.05923290178179741
Validation loss = 0.06254985928535461
Validation loss = 0.05700433626770973
Validation loss = 0.06105368956923485
Validation loss = 0.06270279735326767
Validation loss = 0.05677803233265877
Validation loss = 0.05909382924437523
Validation loss = 0.06007862091064453
Validation loss = 0.0571933351457119
Validation loss = 0.05923198163509369
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0665334016084671
Validation loss = 0.061112482100725174
Validation loss = 0.05860717222094536
Validation loss = 0.061474282294511795
Validation loss = 0.060790881514549255
Validation loss = 0.05855243280529976
Validation loss = 0.06107788532972336
Validation loss = 0.059483159333467484
Validation loss = 0.05831984803080559
Validation loss = 0.0641704574227333
Validation loss = 0.05846735090017319
Validation loss = 0.05879668518900871
Validation loss = 0.06447920203208923
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06346280872821808
Validation loss = 0.05591714754700661
Validation loss = 0.054349154233932495
Validation loss = 0.05660349875688553
Validation loss = 0.054396387189626694
Validation loss = 0.05314355716109276
Validation loss = 0.0606677383184433
Validation loss = 0.053995002061128616
Validation loss = 0.05389047786593437
Validation loss = 0.05452358350157738
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -231     |
| Iteration     | 22       |
| MaximumReturn | 292      |
| MinimumReturn | -974     |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06875621527433395
Validation loss = 0.05299961194396019
Validation loss = 0.053169455379247665
Validation loss = 0.052941758185625076
Validation loss = 0.053561944514513016
Validation loss = 0.05671048164367676
Validation loss = 0.05098910257220268
Validation loss = 0.05686204507946968
Validation loss = 0.052250444889068604
Validation loss = 0.05228984355926514
Validation loss = 0.05420226976275444
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0652838870882988
Validation loss = 0.05506070330739021
Validation loss = 0.052826154977083206
Validation loss = 0.05895566567778587
Validation loss = 0.055058229714632034
Validation loss = 0.05265888571739197
Validation loss = 0.05640266463160515
Validation loss = 0.05410550534725189
Validation loss = 0.05387723073363304
Validation loss = 0.05452544614672661
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06547827273607254
Validation loss = 0.057261865586042404
Validation loss = 0.05701226368546486
Validation loss = 0.05862940475344658
Validation loss = 0.05665145441889763
Validation loss = 0.058160778135061264
Validation loss = 0.05781570076942444
Validation loss = 0.05774194002151489
Validation loss = 0.0572059340775013
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06885088235139847
Validation loss = 0.05677197873592377
Validation loss = 0.05667348578572273
Validation loss = 0.05981266126036644
Validation loss = 0.058428872376680374
Validation loss = 0.05785173177719116
Validation loss = 0.058420196175575256
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05898967385292053
Validation loss = 0.05368177220225334
Validation loss = 0.053066957741975784
Validation loss = 0.053439248353242874
Validation loss = 0.055897269397974014
Validation loss = 0.05173632502555847
Validation loss = 0.05519118905067444
Validation loss = 0.0514947883784771
Validation loss = 0.05438196659088135
Validation loss = 0.05274549499154091
Validation loss = 0.05217469856142998
Validation loss = 0.05652838572859764
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -653     |
| Iteration     | 23       |
| MaximumReturn | -329     |
| MinimumReturn | -1.4e+03 |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06048286333680153
Validation loss = 0.05196266993880272
Validation loss = 0.05095914006233215
Validation loss = 0.05280774459242821
Validation loss = 0.054503656923770905
Validation loss = 0.04917987436056137
Validation loss = 0.052257996052503586
Validation loss = 0.05464246869087219
Validation loss = 0.04924226179718971
Validation loss = 0.05213151127099991
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.060762740671634674
Validation loss = 0.05216540768742561
Validation loss = 0.053878381848335266
Validation loss = 0.052228040993213654
Validation loss = 0.05381772667169571
Validation loss = 0.05626264959573746
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05906393378973007
Validation loss = 0.05585245043039322
Validation loss = 0.05542760342359543
Validation loss = 0.056661125272512436
Validation loss = 0.055990301072597504
Validation loss = 0.05618064105510712
Validation loss = 0.05649738758802414
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06547226011753082
Validation loss = 0.055305324494838715
Validation loss = 0.05545952171087265
Validation loss = 0.05791435390710831
Validation loss = 0.056038759648799896
Validation loss = 0.05666537582874298
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05741802230477333
Validation loss = 0.04981435835361481
Validation loss = 0.05107629671692848
Validation loss = 0.0520956851541996
Validation loss = 0.05137588828802109
Validation loss = 0.05087478458881378
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -550     |
| Iteration     | 24       |
| MaximumReturn | 303      |
| MinimumReturn | -1.6e+03 |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0578206367790699
Validation loss = 0.050215721130371094
Validation loss = 0.04948661848902702
Validation loss = 0.05109377205371857
Validation loss = 0.05092548578977585
Validation loss = 0.049302998930215836
Validation loss = 0.04986027255654335
Validation loss = 0.050832588225603104
Validation loss = 0.047758374363183975
Validation loss = 0.05334504693746567
Validation loss = 0.04861293360590935
Validation loss = 0.048333823680877686
Validation loss = 0.0507272332906723
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05460261553525925
Validation loss = 0.05071903020143509
Validation loss = 0.05149885267019272
Validation loss = 0.04959889501333237
Validation loss = 0.05263417959213257
Validation loss = 0.04817109555006027
Validation loss = 0.04955995827913284
Validation loss = 0.05237070098519325
Validation loss = 0.05067569389939308
Validation loss = 0.048928167670965195
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06392628699541092
Validation loss = 0.054458823055028915
Validation loss = 0.05278234928846359
Validation loss = 0.0563008189201355
Validation loss = 0.05219675600528717
Validation loss = 0.05333989858627319
Validation loss = 0.058103837072849274
Validation loss = 0.052978262305259705
Validation loss = 0.05404302105307579
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05993703380227089
Validation loss = 0.053500618785619736
Validation loss = 0.05413153022527695
Validation loss = 0.05877282842993736
Validation loss = 0.054115358740091324
Validation loss = 0.05480773001909256
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.053990382701158524
Validation loss = 0.04872320592403412
Validation loss = 0.049651917070150375
Validation loss = 0.05280951038002968
Validation loss = 0.04895687475800514
Validation loss = 0.05113092437386513
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -26.2    |
| Iteration     | 25       |
| MaximumReturn | 592      |
| MinimumReturn | -329     |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05473523586988449
Validation loss = 0.04919372498989105
Validation loss = 0.04693765938282013
Validation loss = 0.04904734343290329
Validation loss = 0.04911557212471962
Validation loss = 0.04921875521540642
Validation loss = 0.04910443723201752
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05526525899767876
Validation loss = 0.04815865308046341
Validation loss = 0.047078605741262436
Validation loss = 0.05420348048210144
Validation loss = 0.04713623970746994
Validation loss = 0.04998350143432617
Validation loss = 0.05033104494214058
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.059320513159036636
Validation loss = 0.05129849165678024
Validation loss = 0.051687341183423996
Validation loss = 0.052368711680173874
Validation loss = 0.053970906883478165
Validation loss = 0.04955488070845604
Validation loss = 0.05146615952253342
Validation loss = 0.05803876370191574
Validation loss = 0.04886467009782791
Validation loss = 0.05214398726820946
Validation loss = 0.05257654935121536
Validation loss = 0.04885287955403328
Validation loss = 0.049747053533792496
Validation loss = 0.05060116946697235
Validation loss = 0.04869501292705536
Validation loss = 0.0559396892786026
Validation loss = 0.049062300473451614
Validation loss = 0.04929850623011589
Validation loss = 0.05143480747938156
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.058042652904987335
Validation loss = 0.054097775369882584
Validation loss = 0.05236896872520447
Validation loss = 0.05579152703285217
Validation loss = 0.05197146534919739
Validation loss = 0.054794616997241974
Validation loss = 0.052342887967824936
Validation loss = 0.053790852427482605
Validation loss = 0.05961643159389496
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05607146397233009
Validation loss = 0.04798124358057976
Validation loss = 0.04855849966406822
Validation loss = 0.04926363751292229
Validation loss = 0.05164552107453346
Validation loss = 0.049480609595775604
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -165      |
| Iteration     | 26        |
| MaximumReturn | 262       |
| MinimumReturn | -1.18e+03 |
| TotalSamples  | 112000    |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05375688150525093
Validation loss = 0.04639891907572746
Validation loss = 0.045775555074214935
Validation loss = 0.05037366598844528
Validation loss = 0.04912107437849045
Validation loss = 0.04424130544066429
Validation loss = 0.04593513160943985
Validation loss = 0.048977334052324295
Validation loss = 0.04443753510713577
Validation loss = 0.05083705857396126
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05157842859625816
Validation loss = 0.048072464764118195
Validation loss = 0.04765649512410164
Validation loss = 0.04894339293241501
Validation loss = 0.04981601983308792
Validation loss = 0.04832477495074272
Validation loss = 0.048942942172288895
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05328172445297241
Validation loss = 0.04923234507441521
Validation loss = 0.04959779605269432
Validation loss = 0.05112042278051376
Validation loss = 0.048445843160152435
Validation loss = 0.04860716685652733
Validation loss = 0.04867445304989815
Validation loss = 0.04852582886815071
Validation loss = 0.051107313483953476
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06074612960219383
Validation loss = 0.050116512924432755
Validation loss = 0.05082869902253151
Validation loss = 0.05232712998986244
Validation loss = 0.050340354442596436
Validation loss = 0.055908314883708954
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.053125374019145966
Validation loss = 0.04698220267891884
Validation loss = 0.04911821335554123
Validation loss = 0.04813224822282791
Validation loss = 0.047185707837343216
Validation loss = 0.05065009370446205
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -19.4    |
| Iteration     | 27       |
| MaximumReturn | 434      |
| MinimumReturn | -425     |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.054863862693309784
Validation loss = 0.04510018229484558
Validation loss = 0.04514540359377861
Validation loss = 0.05011541768908501
Validation loss = 0.043285831809043884
Validation loss = 0.04463612288236618
Validation loss = 0.04891824349761009
Validation loss = 0.0440109483897686
Validation loss = 0.0460318885743618
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05480314791202545
Validation loss = 0.04694134742021561
Validation loss = 0.046231865882873535
Validation loss = 0.049440547823905945
Validation loss = 0.045063454657793045
Validation loss = 0.04859213903546333
Validation loss = 0.046471357345581055
Validation loss = 0.047032129019498825
Validation loss = 0.04577825218439102
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05597119778394699
Validation loss = 0.04746656492352486
Validation loss = 0.047609470784664154
Validation loss = 0.04923028126358986
Validation loss = 0.04841715842485428
Validation loss = 0.04720427095890045
Validation loss = 0.050924669951200485
Validation loss = 0.04735627397894859
Validation loss = 0.04805136099457741
Validation loss = 0.046859629452228546
Validation loss = 0.04616710916161537
Validation loss = 0.04802078381180763
Validation loss = 0.04674548655748367
Validation loss = 0.04767652601003647
Validation loss = 0.04682065173983574
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.059188395738601685
Validation loss = 0.05145064741373062
Validation loss = 0.055627185851335526
Validation loss = 0.04860301315784454
Validation loss = 0.05414186418056488
Validation loss = 0.04934322461485863
Validation loss = 0.05038689821958542
Validation loss = 0.05343766510486603
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.048942726105451584
Validation loss = 0.048264432698488235
Validation loss = 0.050090596079826355
Validation loss = 0.04644116014242172
Validation loss = 0.046058427542448044
Validation loss = 0.047986652702093124
Validation loss = 0.04549211636185646
Validation loss = 0.052277229726314545
Validation loss = 0.045664288103580475
Validation loss = 0.045145098119974136
Validation loss = 0.050279393792152405
Validation loss = 0.04792371019721031
Validation loss = 0.04564216732978821
Validation loss = 0.04546460136771202
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 199      |
| Iteration     | 28       |
| MaximumReturn | 727      |
| MinimumReturn | -162     |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.048176899552345276
Validation loss = 0.04466833919286728
Validation loss = 0.04350942000746727
Validation loss = 0.04632612690329552
Validation loss = 0.04364052787423134
Validation loss = 0.04659568890929222
Validation loss = 0.04494159296154976
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.048214200884103775
Validation loss = 0.047321218997240067
Validation loss = 0.04549940675497055
Validation loss = 0.04580994322896004
Validation loss = 0.045429810881614685
Validation loss = 0.046078965067863464
Validation loss = 0.04793929681181908
Validation loss = 0.044195517897605896
Validation loss = 0.04565872997045517
Validation loss = 0.045974381268024445
Validation loss = 0.04378343001008034
Validation loss = 0.04626660421490669
Validation loss = 0.04426781088113785
Validation loss = 0.043753765523433685
Validation loss = 0.05219893157482147
Validation loss = 0.04376079514622688
Validation loss = 0.04493545740842819
Validation loss = 0.0458480529487133
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05163360759615898
Validation loss = 0.044700391590595245
Validation loss = 0.04523502290248871
Validation loss = 0.04940579831600189
Validation loss = 0.04499858245253563
Validation loss = 0.04535824432969093
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05388564616441727
Validation loss = 0.046938877552747726
Validation loss = 0.049732740968465805
Validation loss = 0.04910280182957649
Validation loss = 0.04995458200573921
Validation loss = 0.05436789244413376
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05238543450832367
Validation loss = 0.04638456180691719
Validation loss = 0.04540388286113739
Validation loss = 0.04857863485813141
Validation loss = 0.045062705874443054
Validation loss = 0.0463002547621727
Validation loss = 0.046538665890693665
Validation loss = 0.04544207081198692
Validation loss = 0.05381191521883011
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -220      |
| Iteration     | 29        |
| MaximumReturn | 384       |
| MinimumReturn | -1.29e+03 |
| TotalSamples  | 124000    |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.043337494134902954
Validation loss = 0.04060371220111847
Validation loss = 0.04312516376376152
Validation loss = 0.04160386696457863
Validation loss = 0.04100755602121353
Validation loss = 0.04449868202209473
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.045336175709962845
Validation loss = 0.042579926550388336
Validation loss = 0.04407264664769173
Validation loss = 0.04510004073381424
Validation loss = 0.04287339374423027
Validation loss = 0.04221464693546295
Validation loss = 0.04319608584046364
Validation loss = 0.04291160777211189
Validation loss = 0.04200826585292816
Validation loss = 0.04383271560072899
Validation loss = 0.04131244495511055
Validation loss = 0.04071207344532013
Validation loss = 0.04357333481311798
Validation loss = 0.041084494441747665
Validation loss = 0.043479226529598236
Validation loss = 0.04346279054880142
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04897356033325195
Validation loss = 0.04314672201871872
Validation loss = 0.04526934400200844
Validation loss = 0.04422508180141449
Validation loss = 0.044761091470718384
Validation loss = 0.04330454394221306
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05077419430017471
Validation loss = 0.0468694344162941
Validation loss = 0.0471668541431427
Validation loss = 0.04615255072712898
Validation loss = 0.046269211918115616
Validation loss = 0.04756578430533409
Validation loss = 0.04530785605311394
Validation loss = 0.04855440557003021
Validation loss = 0.047370295971632004
Validation loss = 0.046763207763433456
Validation loss = 0.04712989181280136
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0512930229306221
Validation loss = 0.043061185628175735
Validation loss = 0.043929554522037506
Validation loss = 0.04633620008826256
Validation loss = 0.04282015934586525
Validation loss = 0.043707896023988724
Validation loss = 0.04164126515388489
Validation loss = 0.04227565973997116
Validation loss = 0.04575751721858978
Validation loss = 0.04280537739396095
Validation loss = 0.04373273625969887
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -26.4     |
| Iteration     | 30        |
| MaximumReturn | 474       |
| MinimumReturn | -1.64e+03 |
| TotalSamples  | 128000    |
-----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04354246333241463
Validation loss = 0.040226519107818604
Validation loss = 0.0412486270070076
Validation loss = 0.04274484142661095
Validation loss = 0.04046820104122162
Validation loss = 0.04093833640217781
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04309695214033127
Validation loss = 0.040388502180576324
Validation loss = 0.04140673577785492
Validation loss = 0.04206627234816551
Validation loss = 0.04019654169678688
Validation loss = 0.04203164577484131
Validation loss = 0.04190985485911369
Validation loss = 0.04010450839996338
Validation loss = 0.0408315472304821
Validation loss = 0.04167216643691063
Validation loss = 0.038993753492832184
Validation loss = 0.04094995558261871
Validation loss = 0.0391840860247612
Validation loss = 0.04281890019774437
Validation loss = 0.03995823487639427
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04702020436525345
Validation loss = 0.04313252866268158
Validation loss = 0.046244438737630844
Validation loss = 0.04182485491037369
Validation loss = 0.04182327538728714
Validation loss = 0.047897495329380035
Validation loss = 0.04132431745529175
Validation loss = 0.04389750212430954
Validation loss = 0.04196356236934662
Validation loss = 0.045463062822818756
Validation loss = 0.04048168659210205
Validation loss = 0.041588086634874344
Validation loss = 0.04545687511563301
Validation loss = 0.04263359308242798
Validation loss = 0.040862832218408585
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.045890823006629944
Validation loss = 0.0454057976603508
Validation loss = 0.044556062668561935
Validation loss = 0.0476248562335968
Validation loss = 0.043210770934820175
Validation loss = 0.047707073390483856
Validation loss = 0.043195776641368866
Validation loss = 0.05316542834043503
Validation loss = 0.042531371116638184
Validation loss = 0.04413871467113495
Validation loss = 0.04504936560988426
Validation loss = 0.043553389608860016
Validation loss = 0.04809587076306343
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.044599682092666626
Validation loss = 0.041635386645793915
Validation loss = 0.04132317006587982
Validation loss = 0.04277544468641281
Validation loss = 0.04053857922554016
Validation loss = 0.04172760620713234
Validation loss = 0.042825110256671906
Validation loss = 0.04214178025722504
Validation loss = 0.041101954877376556
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -131     |
| Iteration     | 31       |
| MaximumReturn | 364      |
| MinimumReturn | -1.1e+03 |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.045434314757585526
Validation loss = 0.03906019404530525
Validation loss = 0.04061257466673851
Validation loss = 0.040405161678791046
Validation loss = 0.040239203721284866
Validation loss = 0.04137603938579559
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04133206605911255
Validation loss = 0.03888361155986786
Validation loss = 0.04148351401090622
Validation loss = 0.040966399013996124
Validation loss = 0.040825437754392624
Validation loss = 0.040664948523044586
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04434974491596222
Validation loss = 0.040554627776145935
Validation loss = 0.04180682823061943
Validation loss = 0.04108598828315735
Validation loss = 0.04072393849492073
Validation loss = 0.04614634066820145
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05076078325510025
Validation loss = 0.04243076965212822
Validation loss = 0.04261926934123039
Validation loss = 0.044645726680755615
Validation loss = 0.04150186479091644
Validation loss = 0.04725247994065285
Validation loss = 0.04345521330833435
Validation loss = 0.042212702333927155
Validation loss = 0.042546551674604416
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.043622955679893494
Validation loss = 0.040917012840509415
Validation loss = 0.040326762944459915
Validation loss = 0.04299907013773918
Validation loss = 0.040146924555301666
Validation loss = 0.04160292074084282
Validation loss = 0.0397026501595974
Validation loss = 0.04185061529278755
Validation loss = 0.040808048099279404
Validation loss = 0.04070291295647621
Validation loss = 0.04049387574195862
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -360     |
| Iteration     | 32       |
| MaximumReturn | 513      |
| MinimumReturn | -2.3e+03 |
| TotalSamples  | 136000   |
----------------------------
