Logging to experiments/half_cheetah/test-exp-dir-2/test-exp2_seed3421
Print configuration .....
{'max_val_data': 100000, 'dynamics': {'kfac_params': {'damping': 0.001, 'cov_ema_decay': 0.99, 'momentum': 0.9, 'learning_rate': 0.1, 'kl_clip': 0.0001}, 'intrinsic_reward_only': False, 'enable_particle_ensemble': True, 'external_reward_evaluation_interval': 5, 'mode': 'random', 'batch_size': 1000, 'ensemble_model_count': 5, 'particles': 5, 'activation': 'relu', 'n_layers': 4, 'val': True, 'ensemble': True, 'hidden_size': 1000, 'intrinsic_reward_coeff': 1.0, 'learning_rate': 0.001, 'epochs': 200, 'ita': 1.0, 'pre_training': {'policy_itr': 20, 'mode': 'intrinsic_reward', 'itr': 0}, 'model': 'nn', 'obs_var': 1.0}, 'random_seeds': [4321, 2314, 2341, 3421], 'max_train_data': 200000, 'env_horizon': 1000, 'num_path_random': 6, 'discard_ratio': 0.0, 'start_onpol_iter': 0, 'num_path_onpol': 6, 'onpol_iters': 33, 'env_name': 'half_cheetah', 'trpo': {'gae': 0.95, 'batch_size': 50000, 'iterations': 40, 'step_size': 0.01, 'horizon': 1000, 'gamma': 0.99}, 'algo': 'trpo', 'policy': {'init_logstd': 0.0, 'reinitialize_every_itr': False, 'activation': 'tanh', 'network_shape': [32, 32]}, 'trpo_ext_reward': {'gae': 0.95, 'batch_size': 50000, 'iterations': 20, 'step_size': 0.01, 'horizon': 1000, 'gamma': 0.99}, 'save_variables': False, 'restore_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.7396382093429565
Validation loss = 0.17110082507133484
Validation loss = 0.15169352293014526
Validation loss = 0.12153271585702896
Validation loss = 0.10192301869392395
Validation loss = 0.0962296575307846
Validation loss = 0.0829019546508789
Validation loss = 0.08359243720769882
Validation loss = 0.08945566415786743
Validation loss = 0.08539057523012161
Validation loss = 0.0811951607465744
Validation loss = 0.10757198184728622
Validation loss = 0.08213986456394196
Validation loss = 0.09389608353376389
Validation loss = 0.08316753804683685
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7737916707992554
Validation loss = 0.16011641919612885
Validation loss = 0.16666218638420105
Validation loss = 0.1348041296005249
Validation loss = 0.11663976311683655
Validation loss = 0.10484810173511505
Validation loss = 0.09890110790729523
Validation loss = 0.08403270691633224
Validation loss = 0.09830442070960999
Validation loss = 0.07944689691066742
Validation loss = 0.08691403269767761
Validation loss = 0.09159067273139954
Validation loss = 0.08195224404335022
Validation loss = 0.09448238462209702
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6776375770568848
Validation loss = 0.16858810186386108
Validation loss = 0.15397614240646362
Validation loss = 0.12156783044338226
Validation loss = 0.09672869741916656
Validation loss = 0.08498451858758926
Validation loss = 0.08449185639619827
Validation loss = 0.07592573761940002
Validation loss = 0.08449594676494598
Validation loss = 0.08408841490745544
Validation loss = 0.08595334738492966
Validation loss = 0.08444273471832275
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.812129020690918
Validation loss = 0.16178056597709656
Validation loss = 0.16733643412590027
Validation loss = 0.1230335533618927
Validation loss = 0.1054830402135849
Validation loss = 0.10676201432943344
Validation loss = 0.09816265106201172
Validation loss = 0.10057878494262695
Validation loss = 0.09139905124902725
Validation loss = 0.1120186597108841
Validation loss = 0.09514120221138
Validation loss = 0.10565920919179916
Validation loss = 0.10393193364143372
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.4707828164100647
Validation loss = 0.16012969613075256
Validation loss = 0.13848049938678741
Validation loss = 0.13376693427562714
Validation loss = 0.11135570704936981
Validation loss = 0.10300132632255554
Validation loss = 0.09525898098945618
Validation loss = 0.09802363067865372
Validation loss = 0.10509340465068817
Validation loss = 0.11172978579998016
Validation loss = 0.09182662516832352
Validation loss = 0.09558860957622528
Validation loss = 0.082231804728508
Validation loss = 0.08496315032243729
Validation loss = 0.09133534133434296
Validation loss = 0.08939043432474136
Validation loss = 0.0793556272983551
Validation loss = 0.08736449480056763
Validation loss = 0.09321259707212448
Validation loss = 0.07766357809305191
Validation loss = 0.07874824106693268
Validation loss = 0.08170454949140549
Validation loss = 0.0793224647641182
Validation loss = 0.07799031585454941
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -488     |
| Iteration     | 0        |
| MaximumReturn | -473     |
| MinimumReturn | -521     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11949575692415237
Validation loss = 0.07892757654190063
Validation loss = 0.10324487090110779
Validation loss = 0.15156607329845428
Validation loss = 0.151849627494812
Validation loss = 0.18303287029266357
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12398701906204224
Validation loss = 0.07984406501054764
Validation loss = 0.10028708726167679
Validation loss = 0.12872561812400818
Validation loss = 0.1268153190612793
Validation loss = 0.15564511716365814
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.138545960187912
Validation loss = 0.07877381145954132
Validation loss = 0.08967988193035126
Validation loss = 0.13155321776866913
Validation loss = 0.1630873829126358
Validation loss = 0.19335371255874634
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.14164339005947113
Validation loss = 0.08826562762260437
Validation loss = 0.1330971121788025
Validation loss = 0.1763201802968979
Validation loss = 0.19429908692836761
Validation loss = 0.2283005714416504
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12793323397636414
Validation loss = 0.0992589071393013
Validation loss = 0.18278783559799194
Validation loss = 0.2492353469133377
Validation loss = 0.26540324091911316
Validation loss = 0.2492876797914505
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -344     |
| Iteration     | 1        |
| MaximumReturn | -146     |
| MinimumReturn | -482     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1194905936717987
Validation loss = 0.058734480291604996
Validation loss = 0.051426369696855545
Validation loss = 0.048535604029893875
Validation loss = 0.04686732217669487
Validation loss = 0.04735977575182915
Validation loss = 0.0499521940946579
Validation loss = 0.04384361580014229
Validation loss = 0.04342377185821533
Validation loss = 0.04401642084121704
Validation loss = 0.042913418263196945
Validation loss = 0.04345235601067543
Validation loss = 0.04198047146201134
Validation loss = 0.04752610996365547
Validation loss = 0.041277237236499786
Validation loss = 0.042643919587135315
Validation loss = 0.05158085748553276
Validation loss = 0.04337926581501961
Validation loss = 0.043751779943704605
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11919969320297241
Validation loss = 0.061224713921546936
Validation loss = 0.05486391857266426
Validation loss = 0.05174091085791588
Validation loss = 0.05246853455901146
Validation loss = 0.05552028492093086
Validation loss = 0.052127134054899216
Validation loss = 0.0448293536901474
Validation loss = 0.04517219588160515
Validation loss = 0.04523807764053345
Validation loss = 0.04574349895119667
Validation loss = 0.046363938599824905
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13140875101089478
Validation loss = 0.05832463130354881
Validation loss = 0.05320277810096741
Validation loss = 0.05484774708747864
Validation loss = 0.049185872077941895
Validation loss = 0.0454406701028347
Validation loss = 0.047495026141405106
Validation loss = 0.05330337584018707
Validation loss = 0.045851919800043106
Validation loss = 0.05468903109431267
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.13488921523094177
Validation loss = 0.06304532289505005
Validation loss = 0.05353723093867302
Validation loss = 0.05682246387004852
Validation loss = 0.0483894944190979
Validation loss = 0.047483887523412704
Validation loss = 0.04672597721219063
Validation loss = 0.05665355548262596
Validation loss = 0.04405791684985161
Validation loss = 0.04537646844983101
Validation loss = 0.04358406364917755
Validation loss = 0.044748734682798386
Validation loss = 0.04290284216403961
Validation loss = 0.04253625497221947
Validation loss = 0.041897714138031006
Validation loss = 0.04321697726845741
Validation loss = 0.04137852415442467
Validation loss = 0.04198130965232849
Validation loss = 0.04109572991728783
Validation loss = 0.04423421993851662
Validation loss = 0.05542687699198723
Validation loss = 0.04338638111948967
Validation loss = 0.040683578699827194
Validation loss = 0.039818182587623596
Validation loss = 0.04198335111141205
Validation loss = 0.04404282942414284
Validation loss = 0.043442342430353165
Validation loss = 0.042290180921554565
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12320596724748611
Validation loss = 0.058641064912080765
Validation loss = 0.05232751742005348
Validation loss = 0.051251012831926346
Validation loss = 0.047974634915590286
Validation loss = 0.046117741614580154
Validation loss = 0.04515983536839485
Validation loss = 0.050799667835235596
Validation loss = 0.04507401958107948
Validation loss = 0.045529190450906754
Validation loss = 0.05046739801764488
Validation loss = 0.042328815907239914
Validation loss = 0.042600587010383606
Validation loss = 0.042139794677495956
Validation loss = 0.04190325364470482
Validation loss = 0.04019186273217201
Validation loss = 0.039808932691812515
Validation loss = 0.04041915759444237
Validation loss = 0.043241798877716064
Validation loss = 0.04828175529837608
Validation loss = 0.04558338597416878
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -445     |
| Iteration     | 2        |
| MaximumReturn | -337     |
| MinimumReturn | -551     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09733839333057404
Validation loss = 0.0468437485396862
Validation loss = 0.04295535013079643
Validation loss = 0.04448620229959488
Validation loss = 0.041807711124420166
Validation loss = 0.04025362804532051
Validation loss = 0.03891006484627724
Validation loss = 0.0395541787147522
Validation loss = 0.03763466328382492
Validation loss = 0.04036339372396469
Validation loss = 0.03821422532200813
Validation loss = 0.04465958848595619
Validation loss = 0.03705466538667679
Validation loss = 0.03908609598875046
Validation loss = 0.03700508922338486
Validation loss = 0.03915917128324509
Validation loss = 0.03670618683099747
Validation loss = 0.03607574850320816
Validation loss = 0.03631281852722168
Validation loss = 0.036719419062137604
Validation loss = 0.03742203861474991
Validation loss = 0.037076063454151154
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08947263658046722
Validation loss = 0.04789067804813385
Validation loss = 0.043865691870450974
Validation loss = 0.04137997329235077
Validation loss = 0.04363540560007095
Validation loss = 0.039790455251932144
Validation loss = 0.03779653087258339
Validation loss = 0.03906691074371338
Validation loss = 0.03904064744710922
Validation loss = 0.03685060888528824
Validation loss = 0.039032481610774994
Validation loss = 0.03814445436000824
Validation loss = 0.03760776296257973
Validation loss = 0.03716055676341057
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08251224458217621
Validation loss = 0.05258262902498245
Validation loss = 0.044303543865680695
Validation loss = 0.04150667041540146
Validation loss = 0.04177134856581688
Validation loss = 0.04093851149082184
Validation loss = 0.039254166185855865
Validation loss = 0.03882692754268646
Validation loss = 0.038979671895504
Validation loss = 0.0376923531293869
Validation loss = 0.03934198617935181
Validation loss = 0.03855130821466446
Validation loss = 0.04013514518737793
Validation loss = 0.03600122034549713
Validation loss = 0.037100616842508316
Validation loss = 0.038269758224487305
Validation loss = 0.03646591678261757
Validation loss = 0.03562377393245697
Validation loss = 0.0341932475566864
Validation loss = 0.034913741052150726
Validation loss = 0.03694699704647064
Validation loss = 0.037258490920066833
Validation loss = 0.03544964641332626
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08286528289318085
Validation loss = 0.04404623061418533
Validation loss = 0.0399993360042572
Validation loss = 0.039504241198301315
Validation loss = 0.03946664184331894
Validation loss = 0.038448020815849304
Validation loss = 0.038542453199625015
Validation loss = 0.03910426050424576
Validation loss = 0.03629489615559578
Validation loss = 0.03649279102683067
Validation loss = 0.036088310182094574
Validation loss = 0.03495611622929573
Validation loss = 0.03650538995862007
Validation loss = 0.03478296473622322
Validation loss = 0.03478230908513069
Validation loss = 0.03812669962644577
Validation loss = 0.03368332237005234
Validation loss = 0.03724505752325058
Validation loss = 0.034121073782444
Validation loss = 0.03733178228139877
Validation loss = 0.035422876477241516
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08692044019699097
Validation loss = 0.045215193182229996
Validation loss = 0.042706198990345
Validation loss = 0.03885767236351967
Validation loss = 0.03854288160800934
Validation loss = 0.03952047973871231
Validation loss = 0.03743245452642441
Validation loss = 0.040184423327445984
Validation loss = 0.03819810599088669
Validation loss = 0.036606982350349426
Validation loss = 0.035654861479997635
Validation loss = 0.03834111988544464
Validation loss = 0.03651551157236099
Validation loss = 0.03692626953125
Validation loss = 0.03569547086954117
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -110     |
| Iteration     | 3        |
| MaximumReturn | 205      |
| MinimumReturn | -278     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06320879608392715
Validation loss = 0.040824733674526215
Validation loss = 0.037782877683639526
Validation loss = 0.039492398500442505
Validation loss = 0.039455585181713104
Validation loss = 0.0376262441277504
Validation loss = 0.037002794444561005
Validation loss = 0.03682023286819458
Validation loss = 0.03886137530207634
Validation loss = 0.037598054856061935
Validation loss = 0.036383964121341705
Validation loss = 0.03733731433749199
Validation loss = 0.03546593710780144
Validation loss = 0.03748536482453346
Validation loss = 0.035739343613386154
Validation loss = 0.036294665187597275
Validation loss = 0.039528556168079376
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06011319160461426
Validation loss = 0.043596792966127396
Validation loss = 0.04310007020831108
Validation loss = 0.04044901207089424
Validation loss = 0.03963157534599304
Validation loss = 0.04298184812068939
Validation loss = 0.03954627737402916
Validation loss = 0.039799340069293976
Validation loss = 0.03741928189992905
Validation loss = 0.03773536905646324
Validation loss = 0.03736981749534607
Validation loss = 0.036769039928913116
Validation loss = 0.03688657283782959
Validation loss = 0.03645368665456772
Validation loss = 0.03724776580929756
Validation loss = 0.04199392348527908
Validation loss = 0.03549671918153763
Validation loss = 0.038326967507600784
Validation loss = 0.03837227076292038
Validation loss = 0.03693073242902756
Validation loss = 0.03526555374264717
Validation loss = 0.03552712872624397
Validation loss = 0.03574066981673241
Validation loss = 0.033766526728868484
Validation loss = 0.0400996170938015
Validation loss = 0.0359344407916069
Validation loss = 0.03562365844845772
Validation loss = 0.03434467688202858
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06188499927520752
Validation loss = 0.04293704405426979
Validation loss = 0.0389900878071785
Validation loss = 0.03878513723611832
Validation loss = 0.039309777319431305
Validation loss = 0.03792664036154747
Validation loss = 0.03728920966386795
Validation loss = 0.0367245115339756
Validation loss = 0.03743807598948479
Validation loss = 0.040974754840135574
Validation loss = 0.03861801698803902
Validation loss = 0.03688758611679077
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.057407595217227936
Validation loss = 0.040155649185180664
Validation loss = 0.03873852267861366
Validation loss = 0.036996059119701385
Validation loss = 0.038752757012844086
Validation loss = 0.037052713334560394
Validation loss = 0.03501303121447563
Validation loss = 0.0369134359061718
Validation loss = 0.03675726801156998
Validation loss = 0.03713233023881912
Validation loss = 0.03512325882911682
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05855456739664078
Validation loss = 0.039882171899080276
Validation loss = 0.040020689368247986
Validation loss = 0.03871636092662811
Validation loss = 0.038506727665662766
Validation loss = 0.038384031504392624
Validation loss = 0.03672153502702713
Validation loss = 0.03788202255964279
Validation loss = 0.03688867762684822
Validation loss = 0.0393332839012146
Validation loss = 0.03724884241819382
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 222      |
| Iteration     | 4        |
| MaximumReturn | 793      |
| MinimumReturn | -410     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04323379322886467
Validation loss = 0.036227185279130936
Validation loss = 0.03412621468305588
Validation loss = 0.03769949823617935
Validation loss = 0.03312395140528679
Validation loss = 0.03620350733399391
Validation loss = 0.03412343189120293
Validation loss = 0.03534819185733795
Validation loss = 0.03285697475075722
Validation loss = 0.033155638724565506
Validation loss = 0.03325268253684044
Validation loss = 0.034216731786727905
Validation loss = 0.033127639442682266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04334482550621033
Validation loss = 0.0348660983145237
Validation loss = 0.032156042754650116
Validation loss = 0.03285083547234535
Validation loss = 0.0334014892578125
Validation loss = 0.03182348981499672
Validation loss = 0.033047016710042953
Validation loss = 0.032246485352516174
Validation loss = 0.03410644456744194
Validation loss = 0.03646032512187958
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04270501062273979
Validation loss = 0.036227043718099594
Validation loss = 0.036149125546216965
Validation loss = 0.03636276349425316
Validation loss = 0.03776226192712784
Validation loss = 0.03573385998606682
Validation loss = 0.03453483432531357
Validation loss = 0.03488956391811371
Validation loss = 0.03336823359131813
Validation loss = 0.03326674923300743
Validation loss = 0.032432299107313156
Validation loss = 0.03302060440182686
Validation loss = 0.034664303064346313
Validation loss = 0.03258189186453819
Validation loss = 0.03245142474770546
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03997529670596123
Validation loss = 0.03421168401837349
Validation loss = 0.03420434892177582
Validation loss = 0.03277363255620003
Validation loss = 0.0359356515109539
Validation loss = 0.03275797888636589
Validation loss = 0.032266244292259216
Validation loss = 0.03312636539340019
Validation loss = 0.030863530933856964
Validation loss = 0.03178400918841362
Validation loss = 0.03152399882674217
Validation loss = 0.03170175850391388
Validation loss = 0.036732740700244904
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04257245361804962
Validation loss = 0.0350610613822937
Validation loss = 0.03820718824863434
Validation loss = 0.034180231392383575
Validation loss = 0.03551178425550461
Validation loss = 0.035242389887571335
Validation loss = 0.033858977258205414
Validation loss = 0.033456962555646896
Validation loss = 0.03659934923052788
Validation loss = 0.03307133540511131
Validation loss = 0.033622223883867264
Validation loss = 0.04013664647936821
Validation loss = 0.031900715082883835
Validation loss = 0.03219974413514137
Validation loss = 0.032373081892728806
Validation loss = 0.033980466425418854
Validation loss = 0.03463933244347572
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 996      |
| Iteration     | 5        |
| MaximumReturn | 1.16e+03 |
| MinimumReturn | 881      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.032699767500162125
Validation loss = 0.030645012855529785
Validation loss = 0.029487138614058495
Validation loss = 0.028910143300890923
Validation loss = 0.028842657804489136
Validation loss = 0.02976316213607788
Validation loss = 0.030090996995568275
Validation loss = 0.028642859309911728
Validation loss = 0.027744950726628304
Validation loss = 0.02846704050898552
Validation loss = 0.027837101370096207
Validation loss = 0.029673604294657707
Validation loss = 0.027956733480095863
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0345897451043129
Validation loss = 0.028445672243833542
Validation loss = 0.029033631086349487
Validation loss = 0.028239453211426735
Validation loss = 0.02922031655907631
Validation loss = 0.02757624350488186
Validation loss = 0.028361113741993904
Validation loss = 0.029357900843024254
Validation loss = 0.026896696537733078
Validation loss = 0.027354231104254723
Validation loss = 0.027743304148316383
Validation loss = 0.030466224998235703
Validation loss = 0.0293965395539999
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0314774252474308
Validation loss = 0.028853697702288628
Validation loss = 0.02765941247344017
Validation loss = 0.028790254145860672
Validation loss = 0.028757145628333092
Validation loss = 0.02834221161901951
Validation loss = 0.027573825791478157
Validation loss = 0.028334686532616615
Validation loss = 0.029820742085576057
Validation loss = 0.028160516172647476
Validation loss = 0.028874129056930542
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.034914638847112656
Validation loss = 0.028407249599695206
Validation loss = 0.02795938029885292
Validation loss = 0.027682630345225334
Validation loss = 0.02813890017569065
Validation loss = 0.027308547869324684
Validation loss = 0.027718892320990562
Validation loss = 0.028319504112005234
Validation loss = 0.0267875324934721
Validation loss = 0.028223862871527672
Validation loss = 0.028107089921832085
Validation loss = 0.029490377753973007
Validation loss = 0.028454449027776718
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03585829585790634
Validation loss = 0.028438132256269455
Validation loss = 0.027555789798498154
Validation loss = 0.029361572116613388
Validation loss = 0.02824440784752369
Validation loss = 0.02920312061905861
Validation loss = 0.028360748663544655
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 139      |
| Iteration     | 6        |
| MaximumReturn | 1.05e+03 |
| MinimumReturn | -242     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.034224409610033035
Validation loss = 0.02820204757153988
Validation loss = 0.027994347736239433
Validation loss = 0.026939934119582176
Validation loss = 0.026384737342596054
Validation loss = 0.028823882341384888
Validation loss = 0.026498258113861084
Validation loss = 0.027196595445275307
Validation loss = 0.02654893510043621
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.032704271376132965
Validation loss = 0.02893223986029625
Validation loss = 0.03019469976425171
Validation loss = 0.026674531400203705
Validation loss = 0.028234358876943588
Validation loss = 0.026221200823783875
Validation loss = 0.02688353881239891
Validation loss = 0.02585650607943535
Validation loss = 0.02884979173541069
Validation loss = 0.025837808847427368
Validation loss = 0.026702458038926125
Validation loss = 0.025796249508857727
Validation loss = 0.025057723745703697
Validation loss = 0.027184199541807175
Validation loss = 0.024709556251764297
Validation loss = 0.025774549692869186
Validation loss = 0.025465726852416992
Validation loss = 0.027833648025989532
Validation loss = 0.0242198146879673
Validation loss = 0.025739390403032303
Validation loss = 0.02515208162367344
Validation loss = 0.025989942252635956
Validation loss = 0.026678483933210373
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03337636962532997
Validation loss = 0.028175095096230507
Validation loss = 0.02779773622751236
Validation loss = 0.027797281742095947
Validation loss = 0.025739993900060654
Validation loss = 0.0262858085334301
Validation loss = 0.027060771360993385
Validation loss = 0.025528786703944206
Validation loss = 0.02662431076169014
Validation loss = 0.026114080101251602
Validation loss = 0.027171161025762558
Validation loss = 0.02510048821568489
Validation loss = 0.025765568017959595
Validation loss = 0.030172809958457947
Validation loss = 0.02560969814658165
Validation loss = 0.02520599588751793
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03321461379528046
Validation loss = 0.02808219939470291
Validation loss = 0.027593309059739113
Validation loss = 0.026131335645914078
Validation loss = 0.02796989306807518
Validation loss = 0.0254396740347147
Validation loss = 0.026255054399371147
Validation loss = 0.025806954130530357
Validation loss = 0.025415752083063126
Validation loss = 0.025401173159480095
Validation loss = 0.02663179486989975
Validation loss = 0.024780629202723503
Validation loss = 0.02665378339588642
Validation loss = 0.02554277703166008
Validation loss = 0.024568423628807068
Validation loss = 0.025224925950169563
Validation loss = 0.025123123079538345
Validation loss = 0.025332022458314896
Validation loss = 0.025104854255914688
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0324554406106472
Validation loss = 0.030488425865769386
Validation loss = 0.027751576155424118
Validation loss = 0.02601982280611992
Validation loss = 0.02731373906135559
Validation loss = 0.027060989290475845
Validation loss = 0.02693387120962143
Validation loss = 0.02645479515194893
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 720      |
| Iteration     | 7        |
| MaximumReturn | 1.37e+03 |
| MinimumReturn | 19.6     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.030032703652977943
Validation loss = 0.026654552668333054
Validation loss = 0.027010444551706314
Validation loss = 0.026119345799088478
Validation loss = 0.025437679141759872
Validation loss = 0.026189200580120087
Validation loss = 0.026539847254753113
Validation loss = 0.02534838393330574
Validation loss = 0.025299720466136932
Validation loss = 0.02511303685605526
Validation loss = 0.024361921474337578
Validation loss = 0.024495569989085197
Validation loss = 0.024235475808382034
Validation loss = 0.02475854568183422
Validation loss = 0.02650074101984501
Validation loss = 0.026113592088222504
Validation loss = 0.023323453962802887
Validation loss = 0.02533075213432312
Validation loss = 0.02482795901596546
Validation loss = 0.023610727861523628
Validation loss = 0.02635205164551735
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.028062649071216583
Validation loss = 0.02646620385348797
Validation loss = 0.024976879358291626
Validation loss = 0.026177875697612762
Validation loss = 0.023796571418642998
Validation loss = 0.02314154990017414
Validation loss = 0.02576734498143196
Validation loss = 0.024025896564126015
Validation loss = 0.02539866417646408
Validation loss = 0.02317843586206436
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02976968139410019
Validation loss = 0.02507452666759491
Validation loss = 0.02699737437069416
Validation loss = 0.02486763708293438
Validation loss = 0.023641789332032204
Validation loss = 0.024348141625523567
Validation loss = 0.023936482146382332
Validation loss = 0.02456018142402172
Validation loss = 0.02336769551038742
Validation loss = 0.02379160374403
Validation loss = 0.024428091943264008
Validation loss = 0.02446642518043518
Validation loss = 0.0247262641787529
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030804570764303207
Validation loss = 0.025809338316321373
Validation loss = 0.025121310725808144
Validation loss = 0.025590844452381134
Validation loss = 0.024416647851467133
Validation loss = 0.025467094033956528
Validation loss = 0.0245656855404377
Validation loss = 0.024865344166755676
Validation loss = 0.02337275631725788
Validation loss = 0.024235695600509644
Validation loss = 0.02416452392935753
Validation loss = 0.02407675050199032
Validation loss = 0.02325543574988842
Validation loss = 0.023715727031230927
Validation loss = 0.02448020502924919
Validation loss = 0.024604225531220436
Validation loss = 0.024482717737555504
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02951418049633503
Validation loss = 0.02761060744524002
Validation loss = 0.02609851583838463
Validation loss = 0.02874225750565529
Validation loss = 0.028293540701270103
Validation loss = 0.025861259549856186
Validation loss = 0.024294475093483925
Validation loss = 0.026191705837845802
Validation loss = 0.027052201330661774
Validation loss = 0.024827556684613228
Validation loss = 0.024577254429459572
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.33e+03 |
| Iteration     | 8        |
| MaximumReturn | 1.42e+03 |
| MinimumReturn | 1.19e+03 |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.027493024244904518
Validation loss = 0.02149668149650097
Validation loss = 0.022862887009978294
Validation loss = 0.02185550332069397
Validation loss = 0.02302234247326851
Validation loss = 0.021884189918637276
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02494392916560173
Validation loss = 0.022808387875556946
Validation loss = 0.022021090611815453
Validation loss = 0.025540674105286598
Validation loss = 0.02168542519211769
Validation loss = 0.02149685099720955
Validation loss = 0.02379586175084114
Validation loss = 0.02152428589761257
Validation loss = 0.021974412724375725
Validation loss = 0.021383728832006454
Validation loss = 0.023259388282895088
Validation loss = 0.02261299639940262
Validation loss = 0.0220814011991024
Validation loss = 0.025789495557546616
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027121517807245255
Validation loss = 0.021278133615851402
Validation loss = 0.021497024223208427
Validation loss = 0.023476604372262955
Validation loss = 0.022077031433582306
Validation loss = 0.022189611569046974
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02775821089744568
Validation loss = 0.022690147161483765
Validation loss = 0.02193666249513626
Validation loss = 0.022509334608912468
Validation loss = 0.02206391654908657
Validation loss = 0.022170498967170715
Validation loss = 0.022387390956282616
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.028958028182387352
Validation loss = 0.022790875285863876
Validation loss = 0.02528272196650505
Validation loss = 0.022565729916095734
Validation loss = 0.022924089804291725
Validation loss = 0.02327544614672661
Validation loss = 0.024417323991656303
Validation loss = 0.021775681525468826
Validation loss = 0.02174196019768715
Validation loss = 0.02294754609465599
Validation loss = 0.022052735090255737
Validation loss = 0.022023124620318413
Validation loss = 0.023932963609695435
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 671      |
| Iteration     | 9        |
| MaximumReturn | 1.54e+03 |
| MinimumReturn | -404     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026513416320085526
Validation loss = 0.023407483473420143
Validation loss = 0.022098492830991745
Validation loss = 0.02205188386142254
Validation loss = 0.0220651775598526
Validation loss = 0.02073124423623085
Validation loss = 0.020342590287327766
Validation loss = 0.021454034373164177
Validation loss = 0.02269420400261879
Validation loss = 0.019852932542562485
Validation loss = 0.020240046083927155
Validation loss = 0.020497523248195648
Validation loss = 0.020841769874095917
Validation loss = 0.019899971783161163
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02689235657453537
Validation loss = 0.023384708911180496
Validation loss = 0.022708354517817497
Validation loss = 0.02043214812874794
Validation loss = 0.02134479209780693
Validation loss = 0.022253794595599174
Validation loss = 0.02059219777584076
Validation loss = 0.021420925855636597
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028201216831803322
Validation loss = 0.022497761994600296
Validation loss = 0.021770963445305824
Validation loss = 0.02082154154777527
Validation loss = 0.021389879286289215
Validation loss = 0.02291175164282322
Validation loss = 0.021788032725453377
Validation loss = 0.020587267354130745
Validation loss = 0.020324043929576874
Validation loss = 0.022035028785467148
Validation loss = 0.019419459626078606
Validation loss = 0.020346857607364655
Validation loss = 0.02068440616130829
Validation loss = 0.023970823734998703
Validation loss = 0.020732805132865906
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026112746447324753
Validation loss = 0.02243109978735447
Validation loss = 0.022461770102381706
Validation loss = 0.021760709583759308
Validation loss = 0.021190805360674858
Validation loss = 0.022518374025821686
Validation loss = 0.021299289539456367
Validation loss = 0.02084207348525524
Validation loss = 0.020972369238734245
Validation loss = 0.022990519180893898
Validation loss = 0.021047569811344147
Validation loss = 0.02109270729124546
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026640599593520164
Validation loss = 0.02243696339428425
Validation loss = 0.022501088678836823
Validation loss = 0.0242806077003479
Validation loss = 0.020861007273197174
Validation loss = 0.02145606093108654
Validation loss = 0.02210150845348835
Validation loss = 0.022432668134570122
Validation loss = 0.020739279687404633
Validation loss = 0.022580642253160477
Validation loss = 0.02148187905550003
Validation loss = 0.020664852112531662
Validation loss = 0.021809695288538933
Validation loss = 0.019623512402176857
Validation loss = 0.020095469430088997
Validation loss = 0.02170211263000965
Validation loss = 0.021259469911456108
Validation loss = 0.020330147817730904
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.44e+03 |
| Iteration     | 10       |
| MaximumReturn | 1.48e+03 |
| MinimumReturn | 1.36e+03 |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02362714149057865
Validation loss = 0.019284114241600037
Validation loss = 0.019127704203128815
Validation loss = 0.01907014660537243
Validation loss = 0.019591672345995903
Validation loss = 0.020471634343266487
Validation loss = 0.018770167604088783
Validation loss = 0.022005820646882057
Validation loss = 0.018470747396349907
Validation loss = 0.018428513780236244
Validation loss = 0.020369576290249825
Validation loss = 0.017843211069703102
Validation loss = 0.018510619178414345
Validation loss = 0.018604489043354988
Validation loss = 0.01809464953839779
Validation loss = 0.0195471104234457
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02377787046134472
Validation loss = 0.01919654570519924
Validation loss = 0.01912587322294712
Validation loss = 0.021518483757972717
Validation loss = 0.019644854590296745
Validation loss = 0.01844736933708191
Validation loss = 0.01864074543118477
Validation loss = 0.01940971054136753
Validation loss = 0.018765022978186607
Validation loss = 0.018007975071668625
Validation loss = 0.02129453420639038
Validation loss = 0.01892744190990925
Validation loss = 0.017573056742548943
Validation loss = 0.017938360571861267
Validation loss = 0.021703587844967842
Validation loss = 0.01723303832113743
Validation loss = 0.018434913828969002
Validation loss = 0.017575697973370552
Validation loss = 0.01722838543355465
Validation loss = 0.019486689940094948
Validation loss = 0.01711261086165905
Validation loss = 0.01833774894475937
Validation loss = 0.01851346716284752
Validation loss = 0.01647184044122696
Validation loss = 0.01761315017938614
Validation loss = 0.016823554411530495
Validation loss = 0.018295612186193466
Validation loss = 0.01696021668612957
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021199285984039307
Validation loss = 0.019836289808154106
Validation loss = 0.0194396935403347
Validation loss = 0.019583964720368385
Validation loss = 0.018829092383384705
Validation loss = 0.02128511480987072
Validation loss = 0.017982659861445427
Validation loss = 0.018124448135495186
Validation loss = 0.018097618594765663
Validation loss = 0.022664368152618408
Validation loss = 0.01880813203752041
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02252323366701603
Validation loss = 0.019832294434309006
Validation loss = 0.019593587145209312
Validation loss = 0.019527746364474297
Validation loss = 0.02106708474457264
Validation loss = 0.018691308796405792
Validation loss = 0.019253933802247047
Validation loss = 0.02269996702671051
Validation loss = 0.018289387226104736
Validation loss = 0.01904018223285675
Validation loss = 0.018637901172041893
Validation loss = 0.018485896289348602
Validation loss = 0.01934009976685047
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.020197400823235512
Validation loss = 0.019450398162007332
Validation loss = 0.020538421347737312
Validation loss = 0.020240670070052147
Validation loss = 0.019173255190253258
Validation loss = 0.018001267686486244
Validation loss = 0.020047789439558983
Validation loss = 0.02143576741218567
Validation loss = 0.017531469464302063
Validation loss = 0.018143264576792717
Validation loss = 0.018394609913229942
Validation loss = 0.017847442999482155
Validation loss = 0.019730402156710625
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.36e+03 |
| Iteration     | 11       |
| MaximumReturn | 1.74e+03 |
| MinimumReturn | -91.1    |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02357998676598072
Validation loss = 0.017221851274371147
Validation loss = 0.017657866701483727
Validation loss = 0.017992721870541573
Validation loss = 0.016287431120872498
Validation loss = 0.01898838020861149
Validation loss = 0.016940761357545853
Validation loss = 0.015592223964631557
Validation loss = 0.01636960171163082
Validation loss = 0.015643684193491936
Validation loss = 0.015703847631812096
Validation loss = 0.016999544575810432
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019387612119317055
Validation loss = 0.016081463545560837
Validation loss = 0.0159929096698761
Validation loss = 0.01631740853190422
Validation loss = 0.018201982602477074
Validation loss = 0.015596354380249977
Validation loss = 0.015639984980225563
Validation loss = 0.015668869018554688
Validation loss = 0.016223138198256493
Validation loss = 0.015393986366689205
Validation loss = 0.019226614385843277
Validation loss = 0.014597187750041485
Validation loss = 0.015858056023716927
Validation loss = 0.015203284099698067
Validation loss = 0.016086680814623833
Validation loss = 0.014695635065436363
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02253621444106102
Validation loss = 0.017107602208852768
Validation loss = 0.016377661377191544
Validation loss = 0.017177874222397804
Validation loss = 0.0179783683270216
Validation loss = 0.01691633090376854
Validation loss = 0.016383424401283264
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022854167968034744
Validation loss = 0.017482366412878036
Validation loss = 0.01704765111207962
Validation loss = 0.018182815983891487
Validation loss = 0.016752654686570168
Validation loss = 0.017581511288881302
Validation loss = 0.016764318570494652
Validation loss = 0.017287256196141243
Validation loss = 0.01842368394136429
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019786348566412926
Validation loss = 0.016995271667838097
Validation loss = 0.01680288091301918
Validation loss = 0.016425110399723053
Validation loss = 0.01815747655928135
Validation loss = 0.015539898537099361
Validation loss = 0.016139594838023186
Validation loss = 0.0167645663022995
Validation loss = 0.015720099210739136
Validation loss = 0.015790719538927078
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.8e+03  |
| Iteration     | 12       |
| MaximumReturn | 1.91e+03 |
| MinimumReturn | 1.54e+03 |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018215572461485863
Validation loss = 0.015139882452785969
Validation loss = 0.01552444975823164
Validation loss = 0.015832556411623955
Validation loss = 0.017958583310246468
Validation loss = 0.01463841088116169
Validation loss = 0.014698201790452003
Validation loss = 0.01809293031692505
Validation loss = 0.014187822118401527
Validation loss = 0.01419365406036377
Validation loss = 0.0142000587657094
Validation loss = 0.016317469999194145
Validation loss = 0.013839250430464745
Validation loss = 0.013823583722114563
Validation loss = 0.014519339427351952
Validation loss = 0.015515766106545925
Validation loss = 0.012940681539475918
Validation loss = 0.013661257922649384
Validation loss = 0.015279301442205906
Validation loss = 0.014618806540966034
Validation loss = 0.01345441211014986
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017380181699991226
Validation loss = 0.013810432516038418
Validation loss = 0.014947974123060703
Validation loss = 0.016198256984353065
Validation loss = 0.01367882452905178
Validation loss = 0.015110424719750881
Validation loss = 0.014322391711175442
Validation loss = 0.015316242352128029
Validation loss = 0.01453398633748293
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017886072397232056
Validation loss = 0.015501790679991245
Validation loss = 0.015887055546045303
Validation loss = 0.01927684247493744
Validation loss = 0.01486810389906168
Validation loss = 0.01620807871222496
Validation loss = 0.01566370576620102
Validation loss = 0.015632731840014458
Validation loss = 0.015183739364147186
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.018772155046463013
Validation loss = 0.01570598967373371
Validation loss = 0.018184104934334755
Validation loss = 0.015122649259865284
Validation loss = 0.015927765518426895
Validation loss = 0.016771528869867325
Validation loss = 0.01658141240477562
Validation loss = 0.015525673516094685
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.017789097502827644
Validation loss = 0.015001730062067509
Validation loss = 0.014919916167855263
Validation loss = 0.01727256178855896
Validation loss = 0.01426143478602171
Validation loss = 0.014924474060535431
Validation loss = 0.01627166010439396
Validation loss = 0.014246322214603424
Validation loss = 0.01413779892027378
Validation loss = 0.014382491819560528
Validation loss = 0.015025480650365353
Validation loss = 0.01358780823647976
Validation loss = 0.01681439019739628
Validation loss = 0.013785912655293941
Validation loss = 0.014135954901576042
Validation loss = 0.015737786889076233
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.75e+03 |
| Iteration     | 13       |
| MaximumReturn | 2.06e+03 |
| MinimumReturn | 875      |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01720849797129631
Validation loss = 0.012417268939316273
Validation loss = 0.013454263098537922
Validation loss = 0.012735658325254917
Validation loss = 0.011903105303645134
Validation loss = 0.012986968271434307
Validation loss = 0.011771687306463718
Validation loss = 0.012033110484480858
Validation loss = 0.01432496216148138
Validation loss = 0.011433882638812065
Validation loss = 0.011451533995568752
Validation loss = 0.014388478361070156
Validation loss = 0.011800947599112988
Validation loss = 0.011599400080740452
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.017052533105015755
Validation loss = 0.013841173611581326
Validation loss = 0.013812740333378315
Validation loss = 0.014409538358449936
Validation loss = 0.013443972915410995
Validation loss = 0.012812028639018536
Validation loss = 0.0135124446824193
Validation loss = 0.012339457869529724
Validation loss = 0.012382744811475277
Validation loss = 0.013926343992352486
Validation loss = 0.013854971155524254
Validation loss = 0.012286617420613766
Validation loss = 0.012941567227244377
Validation loss = 0.013949302025139332
Validation loss = 0.011843573302030563
Validation loss = 0.011855742894113064
Validation loss = 0.014106233604252338
Validation loss = 0.014199795201420784
Validation loss = 0.011478343047201633
Validation loss = 0.011974576860666275
Validation loss = 0.014285570941865444
Validation loss = 0.012192021124064922
Validation loss = 0.012063862755894661
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.016940627247095108
Validation loss = 0.015444213524460793
Validation loss = 0.013446937315165997
Validation loss = 0.013405459001660347
Validation loss = 0.013816111721098423
Validation loss = 0.013715611770749092
Validation loss = 0.016631321981549263
Validation loss = 0.013708989135921001
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01794847659766674
Validation loss = 0.014772376976907253
Validation loss = 0.014019965194165707
Validation loss = 0.014713219366967678
Validation loss = 0.014798537828028202
Validation loss = 0.015159515663981438
Validation loss = 0.013978279195725918
Validation loss = 0.013628831133246422
Validation loss = 0.013665365986526012
Validation loss = 0.01319477241486311
Validation loss = 0.013658924028277397
Validation loss = 0.016864649951457977
Validation loss = 0.01306304894387722
Validation loss = 0.0169057734310627
Validation loss = 0.012597686611115932
Validation loss = 0.012660983018577099
Validation loss = 0.014044525101780891
Validation loss = 0.014154965989291668
Validation loss = 0.012426125817000866
Validation loss = 0.014221170917153358
Validation loss = 0.015516993589699268
Validation loss = 0.012542720884084702
Validation loss = 0.012026063166558743
Validation loss = 0.014002010226249695
Validation loss = 0.01239162590354681
Validation loss = 0.012798561714589596
Validation loss = 0.014592582359910011
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01781046763062477
Validation loss = 0.013330359011888504
Validation loss = 0.01344217173755169
Validation loss = 0.013855064287781715
Validation loss = 0.013269534334540367
Validation loss = 0.013364249840378761
Validation loss = 0.012732086703181267
Validation loss = 0.012527511455118656
Validation loss = 0.012064517475664616
Validation loss = 0.01286091934889555
Validation loss = 0.014079793356359005
Validation loss = 0.012521553784608841
Validation loss = 0.011877015233039856
Validation loss = 0.012353545986115932
Validation loss = 0.011931155808269978
Validation loss = 0.011622595600783825
Validation loss = 0.01276345830410719
Validation loss = 0.013490152545273304
Validation loss = 0.011968651786446571
Validation loss = 0.013641279190778732
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.87e+03 |
| Iteration     | 14       |
| MaximumReturn | 2.13e+03 |
| MinimumReturn | 1.39e+03 |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013825208880007267
Validation loss = 0.011342301033437252
Validation loss = 0.011105816811323166
Validation loss = 0.01134311594069004
Validation loss = 0.012476104311645031
Validation loss = 0.010634811595082283
Validation loss = 0.011952981352806091
Validation loss = 0.011669421568512917
Validation loss = 0.011273816227912903
Validation loss = 0.010670319199562073
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.016097573563456535
Validation loss = 0.01205422356724739
Validation loss = 0.011296456679701805
Validation loss = 0.011668214574456215
Validation loss = 0.011666879057884216
Validation loss = 0.01080349087715149
Validation loss = 0.01263074204325676
Validation loss = 0.010997315868735313
Validation loss = 0.012503116391599178
Validation loss = 0.012763941660523415
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01401485688984394
Validation loss = 0.01289740577340126
Validation loss = 0.013657240197062492
Validation loss = 0.013399804010987282
Validation loss = 0.013208648189902306
Validation loss = 0.013815904967486858
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01578378491103649
Validation loss = 0.012017667293548584
Validation loss = 0.01221650280058384
Validation loss = 0.012839260511100292
Validation loss = 0.013966774567961693
Validation loss = 0.011501351371407509
Validation loss = 0.01215129904448986
Validation loss = 0.013166218996047974
Validation loss = 0.011652056127786636
Validation loss = 0.012591911479830742
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015118831768631935
Validation loss = 0.011402253061532974
Validation loss = 0.011920634657144547
Validation loss = 0.011605622246861458
Validation loss = 0.012237770482897758
Validation loss = 0.011391762644052505
Validation loss = 0.01308174803853035
Validation loss = 0.011429418809711933
Validation loss = 0.010773888789117336
Validation loss = 0.012044833973050117
Validation loss = 0.011485549621284008
Validation loss = 0.01115008257329464
Validation loss = 0.012947766110301018
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.91e+03 |
| Iteration     | 15       |
| MaximumReturn | 2.33e+03 |
| MinimumReturn | 474      |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011736823245882988
Validation loss = 0.010330356657505035
Validation loss = 0.01047590933740139
Validation loss = 0.01160525158047676
Validation loss = 0.010319065302610397
Validation loss = 0.010967088863253593
Validation loss = 0.010169989429414272
Validation loss = 0.009505324065685272
Validation loss = 0.010548575781285763
Validation loss = 0.010007312521338463
Validation loss = 0.009942388162016869
Validation loss = 0.009706095792353153
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014518553391098976
Validation loss = 0.010997957549989223
Validation loss = 0.010642859153449535
Validation loss = 0.01124955341219902
Validation loss = 0.010270977392792702
Validation loss = 0.010392628610134125
Validation loss = 0.010717038065195084
Validation loss = 0.009724976494908333
Validation loss = 0.0105955321341753
Validation loss = 0.011184150353074074
Validation loss = 0.009714892134070396
Validation loss = 0.010222135111689568
Validation loss = 0.010394155979156494
Validation loss = 0.009622052311897278
Validation loss = 0.011973951943218708
Validation loss = 0.0094417380169034
Validation loss = 0.00956288818269968
Validation loss = 0.01115674339234829
Validation loss = 0.00954341795295477
Validation loss = 0.010204586200416088
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01536563877016306
Validation loss = 0.011868659406900406
Validation loss = 0.011723087169229984
Validation loss = 0.012131266295909882
Validation loss = 0.014939387328922749
Validation loss = 0.011462300084531307
Validation loss = 0.011618959717452526
Validation loss = 0.01306159421801567
Validation loss = 0.01217720564454794
Validation loss = 0.01117522269487381
Validation loss = 0.011165866628289223
Validation loss = 0.012060403823852539
Validation loss = 0.011233816854655743
Validation loss = 0.011232946068048477
Validation loss = 0.011226736009120941
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013872608542442322
Validation loss = 0.011514884419739246
Validation loss = 0.01106154266744852
Validation loss = 0.010725305415689945
Validation loss = 0.011129542253911495
Validation loss = 0.010496341623365879
Validation loss = 0.0105344383046031
Validation loss = 0.010707814246416092
Validation loss = 0.010935143567621708
Validation loss = 0.0118784848600626
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01515534333884716
Validation loss = 0.011049834080040455
Validation loss = 0.010490142740309238
Validation loss = 0.010062622837722301
Validation loss = 0.010374612174928188
Validation loss = 0.010057777166366577
Validation loss = 0.01017076801508665
Validation loss = 0.011587151326239109
Validation loss = 0.01006848830729723
Validation loss = 0.00989801250398159
Validation loss = 0.009416879154741764
Validation loss = 0.011041858233511448
Validation loss = 0.00949792005121708
Validation loss = 0.011430000886321068
Validation loss = 0.009533093310892582
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.95e+03 |
| Iteration     | 16       |
| MaximumReturn | 2.52e+03 |
| MinimumReturn | 483      |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011312736198306084
Validation loss = 0.009308946318924427
Validation loss = 0.009134484454989433
Validation loss = 0.009381823241710663
Validation loss = 0.008930767886340618
Validation loss = 0.01074323058128357
Validation loss = 0.009010644629597664
Validation loss = 0.009231413714587688
Validation loss = 0.008729290217161179
Validation loss = 0.00886938814073801
Validation loss = 0.009551088325679302
Validation loss = 0.008466087281703949
Validation loss = 0.011121776886284351
Validation loss = 0.008339486084878445
Validation loss = 0.008682959713041782
Validation loss = 0.008340422064065933
Validation loss = 0.00935375690460205
Validation loss = 0.008137592114508152
Validation loss = 0.009733344428241253
Validation loss = 0.008544903248548508
Validation loss = 0.007999713532626629
Validation loss = 0.008721081539988518
Validation loss = 0.008118239231407642
Validation loss = 0.008492722176015377
Validation loss = 0.008549096062779427
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01194246020168066
Validation loss = 0.009339861571788788
Validation loss = 0.009486863389611244
Validation loss = 0.009338540956377983
Validation loss = 0.009082366712391376
Validation loss = 0.01161900907754898
Validation loss = 0.008826996199786663
Validation loss = 0.009073140099644661
Validation loss = 0.009158765897154808
Validation loss = 0.009026404470205307
Validation loss = 0.00878006499260664
Validation loss = 0.010144738480448723
Validation loss = 0.009795796126127243
Validation loss = 0.008321925066411495
Validation loss = 0.0111594432964921
Validation loss = 0.008486357517540455
Validation loss = 0.009528115391731262
Validation loss = 0.008368273265659809
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013106395490467548
Validation loss = 0.010801599361002445
Validation loss = 0.010760033503174782
Validation loss = 0.01104991976171732
Validation loss = 0.010916542261838913
Validation loss = 0.010544718243181705
Validation loss = 0.009846247732639313
Validation loss = 0.010062982328236103
Validation loss = 0.009776181541383266
Validation loss = 0.009697536937892437
Validation loss = 0.010157419368624687
Validation loss = 0.009838098660111427
Validation loss = 0.011645199730992317
Validation loss = 0.009578241035342216
Validation loss = 0.009834451600909233
Validation loss = 0.009438876062631607
Validation loss = 0.010597397573292255
Validation loss = 0.0099552096799016
Validation loss = 0.009249826893210411
Validation loss = 0.010073432698845863
Validation loss = 0.009629465639591217
Validation loss = 0.009156597778201103
Validation loss = 0.01079273596405983
Validation loss = 0.009091438725590706
Validation loss = 0.009340401738882065
Validation loss = 0.008763948455452919
Validation loss = 0.009097378700971603
Validation loss = 0.009665269404649734
Validation loss = 0.008599205873906612
Validation loss = 0.008516686968505383
Validation loss = 0.010360736399888992
Validation loss = 0.00869324617087841
Validation loss = 0.009342597797513008
Validation loss = 0.008940636180341244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012235211208462715
Validation loss = 0.010173371061682701
Validation loss = 0.011331907473504543
Validation loss = 0.009521298110485077
Validation loss = 0.010194052010774612
Validation loss = 0.00947236642241478
Validation loss = 0.009693881496787071
Validation loss = 0.009890137240290642
Validation loss = 0.010183666832745075
Validation loss = 0.009688260965049267
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010979954153299332
Validation loss = 0.0090928440913558
Validation loss = 0.008782288059592247
Validation loss = 0.008975905366241932
Validation loss = 0.009904762730002403
Validation loss = 0.008857849054038525
Validation loss = 0.010113095864653587
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.67e+03 |
| Iteration     | 17       |
| MaximumReturn | 2.53e+03 |
| MinimumReturn | 130      |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012071670964360237
Validation loss = 0.008563442155718803
Validation loss = 0.008285559713840485
Validation loss = 0.00885204691439867
Validation loss = 0.00871377345174551
Validation loss = 0.00799425970762968
Validation loss = 0.00849567074328661
Validation loss = 0.008494041860103607
Validation loss = 0.008347932249307632
Validation loss = 0.011080839671194553
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010029828175902367
Validation loss = 0.00881932396441698
Validation loss = 0.008181896060705185
Validation loss = 0.0087160998955369
Validation loss = 0.009019043296575546
Validation loss = 0.008397722616791725
Validation loss = 0.008664180524647236
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010971290990710258
Validation loss = 0.008359178900718689
Validation loss = 0.008838558569550514
Validation loss = 0.009674581699073315
Validation loss = 0.009120980277657509
Validation loss = 0.009125168435275555
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01201406680047512
Validation loss = 0.009728406555950642
Validation loss = 0.009422766976058483
Validation loss = 0.009807994589209557
Validation loss = 0.008934271521866322
Validation loss = 0.01057643536478281
Validation loss = 0.0094874557107687
Validation loss = 0.009730357676744461
Validation loss = 0.008796118199825287
Validation loss = 0.009146604686975479
Validation loss = 0.009071930311620235
Validation loss = 0.009229469113051891
Validation loss = 0.009880687110126019
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01272426825016737
Validation loss = 0.009945101104676723
Validation loss = 0.00909938383847475
Validation loss = 0.0105220852419734
Validation loss = 0.00868860725313425
Validation loss = 0.010557393543422222
Validation loss = 0.008604295551776886
Validation loss = 0.009085852652788162
Validation loss = 0.00864799041301012
Validation loss = 0.009008579887449741
Validation loss = 0.008698521181941032
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.27e+03 |
| Iteration     | 18       |
| MaximumReturn | 2.94e+03 |
| MinimumReturn | 885      |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009530800394713879
Validation loss = 0.007872248999774456
Validation loss = 0.009005692787468433
Validation loss = 0.009042632766067982
Validation loss = 0.007937848567962646
Validation loss = 0.008390160277485847
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010349975898861885
Validation loss = 0.008432071655988693
Validation loss = 0.009826128371059895
Validation loss = 0.010222489014267921
Validation loss = 0.007883338257670403
Validation loss = 0.008044308051466942
Validation loss = 0.008783151395618916
Validation loss = 0.008087470196187496
Validation loss = 0.009370258077979088
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013019527308642864
Validation loss = 0.00846418272703886
Validation loss = 0.009006726555526257
Validation loss = 0.0097885150462389
Validation loss = 0.008092343807220459
Validation loss = 0.009348606690764427
Validation loss = 0.008776195347309113
Validation loss = 0.008896268904209137
Validation loss = 0.008069226518273354
Validation loss = 0.009051380679011345
Validation loss = 0.007970884442329407
Validation loss = 0.0077550276182591915
Validation loss = 0.008863069117069244
Validation loss = 0.00806311797350645
Validation loss = 0.008271567523479462
Validation loss = 0.008784398436546326
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010465371422469616
Validation loss = 0.009185547940433025
Validation loss = 0.009076950140297413
Validation loss = 0.00932023860514164
Validation loss = 0.008659097366034985
Validation loss = 0.009331977926194668
Validation loss = 0.009285256266593933
Validation loss = 0.009499733336269855
Validation loss = 0.008555782027542591
Validation loss = 0.008745918050408363
Validation loss = 0.008580537512898445
Validation loss = 0.008475924842059612
Validation loss = 0.009797980077564716
Validation loss = 0.008608133532106876
Validation loss = 0.008450688794255257
Validation loss = 0.009283635765314102
Validation loss = 0.008182646706700325
Validation loss = 0.00824989378452301
Validation loss = 0.008927153423428535
Validation loss = 0.008148142136633396
Validation loss = 0.008957002311944962
Validation loss = 0.00843014009296894
Validation loss = 0.008823921903967857
Validation loss = 0.00803648866713047
Validation loss = 0.008275745436549187
Validation loss = 0.008204485289752483
Validation loss = 0.008124449290335178
Validation loss = 0.008340729400515556
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01106572337448597
Validation loss = 0.008518872782588005
Validation loss = 0.011994103901088238
Validation loss = 0.008375139907002449
Validation loss = 0.008176642470061779
Validation loss = 0.009800486266613007
Validation loss = 0.008869784884154797
Validation loss = 0.011059058830142021
Validation loss = 0.008347732946276665
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.42e+03 |
| Iteration     | 19       |
| MaximumReturn | 2.67e+03 |
| MinimumReturn | 2.04e+03 |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.009850756265223026
Validation loss = 0.007466397248208523
Validation loss = 0.007358008064329624
Validation loss = 0.007262984290719032
Validation loss = 0.007218076381832361
Validation loss = 0.00789385661482811
Validation loss = 0.007966600358486176
Validation loss = 0.008171683177351952
Validation loss = 0.0077605582773685455
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.010558134876191616
Validation loss = 0.008122957311570644
Validation loss = 0.008204562589526176
Validation loss = 0.007824708707630634
Validation loss = 0.008811254985630512
Validation loss = 0.008190039545297623
Validation loss = 0.008777921088039875
Validation loss = 0.007436393294483423
Validation loss = 0.00904865749180317
Validation loss = 0.008260991424322128
Validation loss = 0.00778472563251853
Validation loss = 0.007722087670117617
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009564236737787724
Validation loss = 0.008215284906327724
Validation loss = 0.00924625527113676
Validation loss = 0.008595717139542103
Validation loss = 0.007472721394151449
Validation loss = 0.008444658480584621
Validation loss = 0.00765595119446516
Validation loss = 0.009051910601556301
Validation loss = 0.00805304478853941
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009284731931984425
Validation loss = 0.008131908252835274
Validation loss = 0.007467654533684254
Validation loss = 0.007912545464932919
Validation loss = 0.007187513168901205
Validation loss = 0.008057071827352047
Validation loss = 0.00768891628831625
Validation loss = 0.007179601583629847
Validation loss = 0.007196119055151939
Validation loss = 0.008501386269927025
Validation loss = 0.007099813781678677
Validation loss = 0.008851255290210247
Validation loss = 0.0075737591832876205
Validation loss = 0.007419713772833347
Validation loss = 0.007833709008991718
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010491455905139446
Validation loss = 0.008118179626762867
Validation loss = 0.008093353360891342
Validation loss = 0.0076693324372172356
Validation loss = 0.008219086565077305
Validation loss = 0.00845409743487835
Validation loss = 0.007505547255277634
Validation loss = 0.009460892528295517
Validation loss = 0.007628719788044691
Validation loss = 0.007714172825217247
Validation loss = 0.007460331078618765
Validation loss = 0.00912398286163807
Validation loss = 0.0075399973429739475
Validation loss = 0.00753916148096323
Validation loss = 0.008802451193332672
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.68e+03 |
| Iteration     | 20       |
| MaximumReturn | 2.79e+03 |
| MinimumReturn | 2.59e+03 |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008596375584602356
Validation loss = 0.006925871595740318
Validation loss = 0.0070315171033144
Validation loss = 0.006906779482960701
Validation loss = 0.007631045766174793
Validation loss = 0.00880969688296318
Validation loss = 0.006634371355175972
Validation loss = 0.00808910932391882
Validation loss = 0.006575926672667265
Validation loss = 0.008291005156934261
Validation loss = 0.006877759471535683
Validation loss = 0.007930519059300423
Validation loss = 0.00719507597386837
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.00825949665158987
Validation loss = 0.007557732053101063
Validation loss = 0.007285110652446747
Validation loss = 0.007862349040806293
Validation loss = 0.009197131730616093
Validation loss = 0.007468425203114748
Validation loss = 0.007241524290293455
Validation loss = 0.007603127975016832
Validation loss = 0.007476611528545618
Validation loss = 0.006803927943110466
Validation loss = 0.007789855822920799
Validation loss = 0.006673396099358797
Validation loss = 0.008202837780117989
Validation loss = 0.0067251939326524734
Validation loss = 0.00801980122923851
Validation loss = 0.006926333997398615
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009264972060918808
Validation loss = 0.00705824326723814
Validation loss = 0.007905121892690659
Validation loss = 0.008177115581929684
Validation loss = 0.007266406901180744
Validation loss = 0.00946399662643671
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008325452916324139
Validation loss = 0.007103146519511938
Validation loss = 0.008550615049898624
Validation loss = 0.007858073338866234
Validation loss = 0.007122261915355921
Validation loss = 0.006899212021380663
Validation loss = 0.006890844088047743
Validation loss = 0.00703589478507638
Validation loss = 0.008451472967863083
Validation loss = 0.006880160886794329
Validation loss = 0.006888550706207752
Validation loss = 0.007907562889158726
Validation loss = 0.007015220355242491
Validation loss = 0.006609737407416105
Validation loss = 0.006910595111548901
Validation loss = 0.006482420954853296
Validation loss = 0.006890907417982817
Validation loss = 0.00697239488363266
Validation loss = 0.0067149256356060505
Validation loss = 0.007005708292126656
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007713768165558577
Validation loss = 0.00705169141292572
Validation loss = 0.00866258516907692
Validation loss = 0.007131187245249748
Validation loss = 0.007618139963597059
Validation loss = 0.0071049961261451244
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.32e+03 |
| Iteration     | 21       |
| MaximumReturn | 3.05e+03 |
| MinimumReturn | -212     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00871799886226654
Validation loss = 0.006803142838180065
Validation loss = 0.007622766774147749
Validation loss = 0.007143538445234299
Validation loss = 0.006614753045141697
Validation loss = 0.007521617691963911
Validation loss = 0.00636315206065774
Validation loss = 0.0071165855042636395
Validation loss = 0.006372636649757624
Validation loss = 0.006863294169306755
Validation loss = 0.006652060896158218
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007687140256166458
Validation loss = 0.007160302717238665
Validation loss = 0.007561754900962114
Validation loss = 0.006727155763655901
Validation loss = 0.007687729317694902
Validation loss = 0.006735268980264664
Validation loss = 0.008160014636814594
Validation loss = 0.006596617866307497
Validation loss = 0.007146792020648718
Validation loss = 0.00640104990452528
Validation loss = 0.00791772548109293
Validation loss = 0.0068536135368049145
Validation loss = 0.006726179271936417
Validation loss = 0.006644123233854771
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.009541044943034649
Validation loss = 0.007728475145995617
Validation loss = 0.006997221149504185
Validation loss = 0.007499100174754858
Validation loss = 0.007248487789183855
Validation loss = 0.006854964420199394
Validation loss = 0.007504486478865147
Validation loss = 0.0068404097110033035
Validation loss = 0.006589673925191164
Validation loss = 0.007438567467033863
Validation loss = 0.006664087064564228
Validation loss = 0.0073000965639948845
Validation loss = 0.007626103237271309
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.007816505618393421
Validation loss = 0.007629893254488707
Validation loss = 0.006635033059865236
Validation loss = 0.006450657732784748
Validation loss = 0.006916436366736889
Validation loss = 0.008826647885143757
Validation loss = 0.00632005138322711
Validation loss = 0.0062475199811160564
Validation loss = 0.006796202622354031
Validation loss = 0.006351247429847717
Validation loss = 0.008077515289187431
Validation loss = 0.006166649982333183
Validation loss = 0.007572672329843044
Validation loss = 0.005977805703878403
Validation loss = 0.007051438093185425
Validation loss = 0.007468190509825945
Validation loss = 0.006603137124329805
Validation loss = 0.006303608417510986
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.007612981367856264
Validation loss = 0.0070187486708164215
Validation loss = 0.006782317999750376
Validation loss = 0.007081645540893078
Validation loss = 0.00723940460011363
Validation loss = 0.006683337967842817
Validation loss = 0.007083779666572809
Validation loss = 0.007264446467161179
Validation loss = 0.007589579094201326
Validation loss = 0.006603446323424578
Validation loss = 0.008288470096886158
Validation loss = 0.006347114685922861
Validation loss = 0.007792136166244745
Validation loss = 0.006399322301149368
Validation loss = 0.007408073637634516
Validation loss = 0.006408049259334803
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.87e+03 |
| Iteration     | 22       |
| MaximumReturn | 3.07e+03 |
| MinimumReturn | 2.7e+03  |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007794697303324938
Validation loss = 0.006073861848562956
Validation loss = 0.006181505974382162
Validation loss = 0.006899545434862375
Validation loss = 0.006423381622880697
Validation loss = 0.006225444842129946
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007646661251783371
Validation loss = 0.00624605268239975
Validation loss = 0.007378362584859133
Validation loss = 0.007081527262926102
Validation loss = 0.006411264184862375
Validation loss = 0.006697585340589285
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.008422876708209515
Validation loss = 0.006630727555602789
Validation loss = 0.0075234584510326385
Validation loss = 0.006390319671481848
Validation loss = 0.00660678930580616
Validation loss = 0.006927732843905687
Validation loss = 0.007355993147939444
Validation loss = 0.0062197670340538025
Validation loss = 0.007215525954961777
Validation loss = 0.006481217686086893
Validation loss = 0.007555494550615549
Validation loss = 0.006610892713069916
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.008215267211198807
Validation loss = 0.006086739245802164
Validation loss = 0.006048391107469797
Validation loss = 0.006615194957703352
Validation loss = 0.006328735034912825
Validation loss = 0.006547804921865463
Validation loss = 0.005798984318971634
Validation loss = 0.007368308026343584
Validation loss = 0.005927681922912598
Validation loss = 0.0063826232217252254
Validation loss = 0.0057241967879235744
Validation loss = 0.007122408133000135
Validation loss = 0.005756352096796036
Validation loss = 0.006172164808958769
Validation loss = 0.005903204903006554
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.008003134280443192
Validation loss = 0.006202407646924257
Validation loss = 0.006204052362591028
Validation loss = 0.006507374346256256
Validation loss = 0.006159841548651457
Validation loss = 0.007238013204187155
Validation loss = 0.006109117064625025
Validation loss = 0.006326169241219759
Validation loss = 0.0074425674974918365
Validation loss = 0.006580861285328865
Validation loss = 0.00911086704581976
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.95e+03 |
| Iteration     | 23       |
| MaximumReturn | 3.02e+03 |
| MinimumReturn | 2.87e+03 |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006643480621278286
Validation loss = 0.006387634202837944
Validation loss = 0.0062388586811721325
Validation loss = 0.006327492184937
Validation loss = 0.006058321800082922
Validation loss = 0.006165431812405586
Validation loss = 0.0059781246818602085
Validation loss = 0.006049018353223801
Validation loss = 0.005810518749058247
Validation loss = 0.0059116799384355545
Validation loss = 0.006109526380896568
Validation loss = 0.005550581030547619
Validation loss = 0.006278710439801216
Validation loss = 0.0059578861109912395
Validation loss = 0.005999817978590727
Validation loss = 0.005979627836495638
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006702794227749109
Validation loss = 0.006474026013165712
Validation loss = 0.006725557614117861
Validation loss = 0.006445368751883507
Validation loss = 0.00708763487637043
Validation loss = 0.006133696995675564
Validation loss = 0.006842115893959999
Validation loss = 0.006387648638337851
Validation loss = 0.005723284091800451
Validation loss = 0.0067276060581207275
Validation loss = 0.007372954394668341
Validation loss = 0.006047074217349291
Validation loss = 0.006538236513733864
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006674709729850292
Validation loss = 0.005956104025244713
Validation loss = 0.0062499032355844975
Validation loss = 0.007080804090946913
Validation loss = 0.00724896602332592
Validation loss = 0.006868649274110794
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006818392779678106
Validation loss = 0.005806589033454657
Validation loss = 0.006008227355778217
Validation loss = 0.005740281194448471
Validation loss = 0.005512203089892864
Validation loss = 0.006213556043803692
Validation loss = 0.0057759773917496204
Validation loss = 0.006039158906787634
Validation loss = 0.005579221062362194
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006687840912491083
Validation loss = 0.005969752557575703
Validation loss = 0.007102902978658676
Validation loss = 0.006400046870112419
Validation loss = 0.006287548691034317
Validation loss = 0.006751637905836105
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.05e+03 |
| Iteration     | 24       |
| MaximumReturn | 3.39e+03 |
| MinimumReturn | 2.53e+03 |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.007105332333594561
Validation loss = 0.0059125605039298534
Validation loss = 0.007114626932889223
Validation loss = 0.006120894569903612
Validation loss = 0.00767557043582201
Validation loss = 0.005988254677504301
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007738166488707066
Validation loss = 0.005605819169431925
Validation loss = 0.006096345838159323
Validation loss = 0.006936349906027317
Validation loss = 0.0065674204379320145
Validation loss = 0.006021364592015743
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006824742536991835
Validation loss = 0.006200999952852726
Validation loss = 0.00607039500027895
Validation loss = 0.005705587100237608
Validation loss = 0.00608034385368228
Validation loss = 0.006551223341375589
Validation loss = 0.008128693327307701
Validation loss = 0.006428137421607971
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0060959383845329285
Validation loss = 0.0056461128406226635
Validation loss = 0.005400220863521099
Validation loss = 0.0056175729259848595
Validation loss = 0.006169930100440979
Validation loss = 0.005766012705862522
Validation loss = 0.0053910245187580585
Validation loss = 0.006222003139555454
Validation loss = 0.005308361724019051
Validation loss = 0.006309818010777235
Validation loss = 0.005418527405709028
Validation loss = 0.005617551505565643
Validation loss = 0.00535210408270359
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006501259747892618
Validation loss = 0.00585733400657773
Validation loss = 0.005910396575927734
Validation loss = 0.006808307487517595
Validation loss = 0.005651313345879316
Validation loss = 0.006417010445147753
Validation loss = 0.006955539807677269
Validation loss = 0.0058760205283761024
Validation loss = 0.006544085685163736
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.5e+03  |
| Iteration     | 25       |
| MaximumReturn | 3.28e+03 |
| MinimumReturn | 936      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006363467313349247
Validation loss = 0.005609442945569754
Validation loss = 0.005958006251603365
Validation loss = 0.005492884665727615
Validation loss = 0.005952071398496628
Validation loss = 0.005854936316609383
Validation loss = 0.005486867390573025
Validation loss = 0.005811261013150215
Validation loss = 0.00572727108374238
Validation loss = 0.006175062153488398
Validation loss = 0.0065138074569404125
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.006912857294082642
Validation loss = 0.0057181185111403465
Validation loss = 0.006158197298645973
Validation loss = 0.00617809034883976
Validation loss = 0.00657557463273406
Validation loss = 0.005580315366387367
Validation loss = 0.005711686797440052
Validation loss = 0.006029700860381126
Validation loss = 0.005340695846825838
Validation loss = 0.006532996892929077
Validation loss = 0.005654027685523033
Validation loss = 0.005836460739374161
Validation loss = 0.0065024737268686295
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006840724963694811
Validation loss = 0.006272202357649803
Validation loss = 0.005824870429933071
Validation loss = 0.006016239989548922
Validation loss = 0.006083224434405565
Validation loss = 0.005873874761164188
Validation loss = 0.006682100240141153
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006601062137633562
Validation loss = 0.006275498308241367
Validation loss = 0.005585393402725458
Validation loss = 0.005548521876335144
Validation loss = 0.0061110747046768665
Validation loss = 0.0051498375833034515
Validation loss = 0.006031866651028395
Validation loss = 0.0057882401160895824
Validation loss = 0.005307361483573914
Validation loss = 0.005476430058479309
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006250059697777033
Validation loss = 0.006081691011786461
Validation loss = 0.005737673491239548
Validation loss = 0.0055253151804208755
Validation loss = 0.006843569688498974
Validation loss = 0.005611485801637173
Validation loss = 0.005837502423673868
Validation loss = 0.005534691270440817
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.43e+03 |
| Iteration     | 26       |
| MaximumReturn | 2.91e+03 |
| MinimumReturn | 991      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006149119697511196
Validation loss = 0.0059741586446762085
Validation loss = 0.005319621879607439
Validation loss = 0.005466942209750414
Validation loss = 0.006039947271347046
Validation loss = 0.005470441188663244
Validation loss = 0.005227833986282349
Validation loss = 0.005078443326056004
Validation loss = 0.0052028740756213665
Validation loss = 0.005541737657040358
Validation loss = 0.006863678339868784
Validation loss = 0.005574805196374655
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.007194682955741882
Validation loss = 0.005740355234593153
Validation loss = 0.00563967227935791
Validation loss = 0.0056969341821968555
Validation loss = 0.005320829339325428
Validation loss = 0.005244099535048008
Validation loss = 0.005846632178872824
Validation loss = 0.005086296238005161
Validation loss = 0.005468984600156546
Validation loss = 0.005837795790284872
Validation loss = 0.0056408424861729145
Validation loss = 0.005509932059794664
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006618956569582224
Validation loss = 0.00556593481451273
Validation loss = 0.005169529467821121
Validation loss = 0.0061421324498951435
Validation loss = 0.006294562481343746
Validation loss = 0.005650727543979883
Validation loss = 0.005820561666041613
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005302498582750559
Validation loss = 0.006000986788421869
Validation loss = 0.0056800697930157185
Validation loss = 0.005319804418832064
Validation loss = 0.004953222814947367
Validation loss = 0.0051484741270542145
Validation loss = 0.005429673008620739
Validation loss = 0.006413628812879324
Validation loss = 0.005620954092592001
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006684429943561554
Validation loss = 0.005678772926330566
Validation loss = 0.005234903655946255
Validation loss = 0.005856792908161879
Validation loss = 0.005824997089803219
Validation loss = 0.0056267171166837215
Validation loss = 0.0057141524739563465
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.15e+03 |
| Iteration     | 27       |
| MaximumReturn | 3.39e+03 |
| MinimumReturn | 2.69e+03 |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005523290019482374
Validation loss = 0.005057778209447861
Validation loss = 0.007448569871485233
Validation loss = 0.005270263645797968
Validation loss = 0.006089743226766586
Validation loss = 0.0050001903437078
Validation loss = 0.005602742545306683
Validation loss = 0.005109230987727642
Validation loss = 0.005343893077224493
Validation loss = 0.005307190585881472
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005632853135466576
Validation loss = 0.005915400106459856
Validation loss = 0.005351644940674305
Validation loss = 0.006095236632972956
Validation loss = 0.005459426436573267
Validation loss = 0.005663716234266758
Validation loss = 0.00630895234644413
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00616904441267252
Validation loss = 0.005230557173490524
Validation loss = 0.0054209413938224316
Validation loss = 0.005601237993687391
Validation loss = 0.0054428246803581715
Validation loss = 0.005128314718604088
Validation loss = 0.006169415079057217
Validation loss = 0.006758922711014748
Validation loss = 0.0053411428816616535
Validation loss = 0.005502643063664436
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006804312113672495
Validation loss = 0.0052750976756215096
Validation loss = 0.005228383932262659
Validation loss = 0.005366127472370863
Validation loss = 0.0050269425846636295
Validation loss = 0.005410637706518173
Validation loss = 0.005081722512841225
Validation loss = 0.005710785277187824
Validation loss = 0.006395168136805296
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.006698480807244778
Validation loss = 0.0053151268512010574
Validation loss = 0.005715226288884878
Validation loss = 0.005409979727119207
Validation loss = 0.005690276622772217
Validation loss = 0.005192926619201899
Validation loss = 0.005325714126229286
Validation loss = 0.005310974549502134
Validation loss = 0.0050989133305847645
Validation loss = 0.005186691414564848
Validation loss = 0.0051808650605380535
Validation loss = 0.005517116282135248
Validation loss = 0.006690588779747486
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.13e+03 |
| Iteration     | 28       |
| MaximumReturn | 3.4e+03  |
| MinimumReturn | 2.49e+03 |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.006088143680244684
Validation loss = 0.005039355717599392
Validation loss = 0.005126809235662222
Validation loss = 0.005121686030179262
Validation loss = 0.005428078584372997
Validation loss = 0.005091213621199131
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005869186948984861
Validation loss = 0.005094160325825214
Validation loss = 0.005016420502215624
Validation loss = 0.005284314509481192
Validation loss = 0.005954390857368708
Validation loss = 0.004867008421570063
Validation loss = 0.005079932976514101
Validation loss = 0.005131886340677738
Validation loss = 0.005103344097733498
Validation loss = 0.0053281038999557495
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0058080945163965225
Validation loss = 0.005472214892506599
Validation loss = 0.006016187369823456
Validation loss = 0.005293143913149834
Validation loss = 0.0057522752322256565
Validation loss = 0.005022736731916666
Validation loss = 0.005652346648275852
Validation loss = 0.005011946428567171
Validation loss = 0.0051218001171946526
Validation loss = 0.0059598092921078205
Validation loss = 0.00505415815860033
Validation loss = 0.005750245880335569
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005327093414962292
Validation loss = 0.0047803097404539585
Validation loss = 0.005062848795205355
Validation loss = 0.0050181071273982525
Validation loss = 0.005315288435667753
Validation loss = 0.005400185938924551
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005836529657244682
Validation loss = 0.004901057109236717
Validation loss = 0.005937617737799883
Validation loss = 0.006075901445001364
Validation loss = 0.005310389678925276
Validation loss = 0.005000822711735964
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.42e+03 |
| Iteration     | 29       |
| MaximumReturn | 3.58e+03 |
| MinimumReturn | 432      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005549960304051638
Validation loss = 0.00580259645357728
Validation loss = 0.005686532240360975
Validation loss = 0.006171609275043011
Validation loss = 0.004834347870200872
Validation loss = 0.0054218522273004055
Validation loss = 0.005507349036633968
Validation loss = 0.0049400487914681435
Validation loss = 0.004952301736921072
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005749525967985392
Validation loss = 0.005675164051353931
Validation loss = 0.004959571175277233
Validation loss = 0.00525703513994813
Validation loss = 0.005342013668268919
Validation loss = 0.005147985182702541
Validation loss = 0.005056146532297134
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.006129562854766846
Validation loss = 0.005111249629408121
Validation loss = 0.005089849699288607
Validation loss = 0.0052380673587322235
Validation loss = 0.005395512096583843
Validation loss = 0.005376250948756933
Validation loss = 0.005258428864181042
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005432459991425276
Validation loss = 0.004849498625844717
Validation loss = 0.004905113484710455
Validation loss = 0.004682860802859068
Validation loss = 0.005043354816734791
Validation loss = 0.004850433208048344
Validation loss = 0.005525853484869003
Validation loss = 0.004792147781699896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005650239530950785
Validation loss = 0.005383437965065241
Validation loss = 0.005263546481728554
Validation loss = 0.005217059049755335
Validation loss = 0.00559151591733098
Validation loss = 0.005040494259446859
Validation loss = 0.0050703720189630985
Validation loss = 0.0057898093946278095
Validation loss = 0.005165573675185442
Validation loss = 0.005693613551557064
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.08e+03 |
| Iteration     | 30       |
| MaximumReturn | 3.62e+03 |
| MinimumReturn | 1.45e+03 |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005314079113304615
Validation loss = 0.005406029988080263
Validation loss = 0.005196215584874153
Validation loss = 0.004988046362996101
Validation loss = 0.0050064194947481155
Validation loss = 0.004891935270279646
Validation loss = 0.005004355683922768
Validation loss = 0.00489626731723547
Validation loss = 0.005336190573871136
Validation loss = 0.005142427049577236
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005784453824162483
Validation loss = 0.005549387075006962
Validation loss = 0.005724819842725992
Validation loss = 0.005409791134297848
Validation loss = 0.005340860225260258
Validation loss = 0.0051661087200045586
Validation loss = 0.004659448750317097
Validation loss = 0.005855940282344818
Validation loss = 0.005044099409133196
Validation loss = 0.00590081000700593
Validation loss = 0.0050416747108101845
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.00581643870100379
Validation loss = 0.005227396264672279
Validation loss = 0.005198928527534008
Validation loss = 0.005079738795757294
Validation loss = 0.0047166356816887856
Validation loss = 0.005165676586329937
Validation loss = 0.004959693178534508
Validation loss = 0.005345896352082491
Validation loss = 0.00546360295265913
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.006034672725945711
Validation loss = 0.005153522826731205
Validation loss = 0.004911030642688274
Validation loss = 0.005625343881547451
Validation loss = 0.005460661370307207
Validation loss = 0.00612234603613615
Validation loss = 0.004839213564991951
Validation loss = 0.005131056997925043
Validation loss = 0.0046263523399829865
Validation loss = 0.004541158676147461
Validation loss = 0.004795657005161047
Validation loss = 0.004716384224593639
Validation loss = 0.004938771016895771
Validation loss = 0.004557878710329533
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.005495904479175806
Validation loss = 0.005093335174024105
Validation loss = 0.004968960769474506
Validation loss = 0.005221325904130936
Validation loss = 0.005591541063040495
Validation loss = 0.005725707858800888
Validation loss = 0.0046218084171414375
Validation loss = 0.005025858990848064
Validation loss = 0.00546575803309679
Validation loss = 0.004593594931066036
Validation loss = 0.004900152795016766
Validation loss = 0.004870572127401829
Validation loss = 0.005433239974081516
Validation loss = 0.00485129002481699
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.9e+03  |
| Iteration     | 31       |
| MaximumReturn | 3.62e+03 |
| MinimumReturn | 152      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.005871174391359091
Validation loss = 0.005180073902010918
Validation loss = 0.00533057888969779
Validation loss = 0.004896488040685654
Validation loss = 0.004709391854703426
Validation loss = 0.004843778908252716
Validation loss = 0.005208600778132677
Validation loss = 0.0047314586117863655
Validation loss = 0.0046749114990234375
Validation loss = 0.005344691686332226
Validation loss = 0.004703033249825239
Validation loss = 0.004679262172430754
Validation loss = 0.004755890928208828
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.005581755191087723
Validation loss = 0.0049087004736065865
Validation loss = 0.005111211910843849
Validation loss = 0.00472706463187933
Validation loss = 0.004954224452376366
Validation loss = 0.004823498427867889
Validation loss = 0.005229970905929804
Validation loss = 0.004843381233513355
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0052376966923475266
Validation loss = 0.006545469630509615
Validation loss = 0.0046727582812309265
Validation loss = 0.004973480477929115
Validation loss = 0.004752921871840954
Validation loss = 0.005267704837024212
Validation loss = 0.004808719735592604
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.005343415308743715
Validation loss = 0.004972134251147509
Validation loss = 0.004690652247518301
Validation loss = 0.004690068773925304
Validation loss = 0.004640453029423952
Validation loss = 0.004545788746327162
Validation loss = 0.004784270655363798
Validation loss = 0.004684050101786852
Validation loss = 0.004760819487273693
Validation loss = 0.004857122432440519
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.00565363559871912
Validation loss = 0.0044731926172971725
Validation loss = 0.005124978721141815
Validation loss = 0.004709639120846987
Validation loss = 0.004955147858709097
Validation loss = 0.005300435703247786
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.41e+03 |
| Iteration     | 32       |
| MaximumReturn | 3.73e+03 |
| MinimumReturn | -43.8    |
| TotalSamples  | 136000   |
----------------------------
