Logging to experiments/half_cheetah/test-exp-dir-2/test-exp2_seed2341
Print configuration .....
{'max_val_data': 100000, 'dynamics': {'kfac_params': {'damping': 0.001, 'cov_ema_decay': 0.99, 'momentum': 0.9, 'learning_rate': 0.1, 'kl_clip': 0.0001}, 'intrinsic_reward_only': False, 'enable_particle_ensemble': True, 'external_reward_evaluation_interval': 5, 'mode': 'random', 'batch_size': 1000, 'ensemble_model_count': 5, 'particles': 5, 'activation': 'relu', 'n_layers': 4, 'val': True, 'ensemble': True, 'hidden_size': 1000, 'intrinsic_reward_coeff': 1.0, 'learning_rate': 0.001, 'epochs': 200, 'ita': 1.0, 'pre_training': {'policy_itr': 20, 'mode': 'intrinsic_reward', 'itr': 0}, 'model': 'nn', 'obs_var': 1.0}, 'random_seeds': [4321, 2314, 2341, 3421], 'max_train_data': 200000, 'env_horizon': 1000, 'num_path_random': 6, 'discard_ratio': 0.0, 'start_onpol_iter': 0, 'num_path_onpol': 6, 'onpol_iters': 33, 'env_name': 'half_cheetah', 'trpo': {'gae': 0.95, 'batch_size': 50000, 'iterations': 40, 'step_size': 0.01, 'horizon': 1000, 'gamma': 0.99}, 'algo': 'trpo', 'policy': {'init_logstd': 0.0, 'reinitialize_every_itr': False, 'activation': 'tanh', 'network_shape': [32, 32]}, 'trpo_ext_reward': {'gae': 0.95, 'batch_size': 50000, 'iterations': 20, 'step_size': 0.01, 'horizon': 1000, 'gamma': 0.99}, 'save_variables': False, 'restore_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.6645402908325195
Validation loss = 0.18177321553230286
Validation loss = 0.10818539559841156
Validation loss = 0.08291415125131607
Validation loss = 0.07648536562919617
Validation loss = 0.07274352759122849
Validation loss = 0.06319841742515564
Validation loss = 0.06321078538894653
Validation loss = 0.06327326595783234
Validation loss = 0.07294543087482452
Validation loss = 0.058776818215847015
Validation loss = 0.05647272616624832
Validation loss = 0.05955161154270172
Validation loss = 0.05730539932847023
Validation loss = 0.05867873877286911
Validation loss = 0.05881790071725845
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.7004467248916626
Validation loss = 0.1963304877281189
Validation loss = 0.12492813169956207
Validation loss = 0.08544845879077911
Validation loss = 0.07400685548782349
Validation loss = 0.07021848857402802
Validation loss = 0.0682682991027832
Validation loss = 0.06717601418495178
Validation loss = 0.06207926943898201
Validation loss = 0.06822633743286133
Validation loss = 0.06663663685321808
Validation loss = 0.057560063898563385
Validation loss = 0.06212986260652542
Validation loss = 0.056390728801488876
Validation loss = 0.05938412994146347
Validation loss = 0.05867733806371689
Validation loss = 0.05438463017344475
Validation loss = 0.05669890344142914
Validation loss = 0.057737767696380615
Validation loss = 0.05813420191407204
Validation loss = 0.05439362674951553
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4683231711387634
Validation loss = 0.2008899748325348
Validation loss = 0.12874512374401093
Validation loss = 0.08830024302005768
Validation loss = 0.07686979323625565
Validation loss = 0.07167023420333862
Validation loss = 0.06845903396606445
Validation loss = 0.06733065098524094
Validation loss = 0.06203477829694748
Validation loss = 0.06250417232513428
Validation loss = 0.05741971358656883
Validation loss = 0.06506028026342392
Validation loss = 0.05892418324947357
Validation loss = 0.05405120551586151
Validation loss = 0.057658836245536804
Validation loss = 0.05288797244429588
Validation loss = 0.0539608895778656
Validation loss = 0.05553896725177765
Validation loss = 0.09636238217353821
Validation loss = 0.05559536814689636
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7047640681266785
Validation loss = 0.17754483222961426
Validation loss = 0.11179086565971375
Validation loss = 0.08287164568901062
Validation loss = 0.07254109531641006
Validation loss = 0.0746724009513855
Validation loss = 0.06607487052679062
Validation loss = 0.07390757650136948
Validation loss = 0.061619993299245834
Validation loss = 0.06051984429359436
Validation loss = 0.057629652321338654
Validation loss = 0.07753914594650269
Validation loss = 0.05868492275476456
Validation loss = 0.05500771850347519
Validation loss = 0.05973713845014572
Validation loss = 0.056121665984392166
Validation loss = 0.057291626930236816
Validation loss = 0.054719991981983185
Validation loss = 0.05275000259280205
Validation loss = 0.05939842760562897
Validation loss = 0.05675027519464493
Validation loss = 0.05271061509847641
Validation loss = 0.06012307107448578
Validation loss = 0.053173452615737915
Validation loss = 0.06887392699718475
Validation loss = 0.05375541001558304
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6995645761489868
Validation loss = 0.1869959682226181
Validation loss = 0.11978248506784439
Validation loss = 0.0830666571855545
Validation loss = 0.07218191772699356
Validation loss = 0.06648097932338715
Validation loss = 0.07071753591299057
Validation loss = 0.06149851158261299
Validation loss = 0.06949585676193237
Validation loss = 0.06151836737990379
Validation loss = 0.05774245411157608
Validation loss = 0.05785040557384491
Validation loss = 0.0565803200006485
Validation loss = 0.055911868810653687
Validation loss = 0.060910873115062714
Validation loss = 0.055376168340444565
Validation loss = 0.05598427355289459
Validation loss = 0.07568363845348358
Validation loss = 0.05544852465391159
Validation loss = 0.05354895442724228
Validation loss = 0.06255137920379639
Validation loss = 0.057392194867134094
Validation loss = 0.052569422870874405
Validation loss = 0.057273149490356445
Validation loss = 0.05192747339606285
Validation loss = 0.052859336137771606
Validation loss = 0.05312133580446243
Validation loss = 0.05278788134455681
Validation loss = 0.05635387450456619
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -340     |
| Iteration     | 0        |
| MaximumReturn | -233     |
| MinimumReturn | -437     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 1.6081132888793945
Validation loss = 2.6007368564605713
Validation loss = 3.0788702964782715
Validation loss = 3.023216962814331
Validation loss = 3.415079355239868
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 1.4506945610046387
Validation loss = 2.4606220722198486
Validation loss = 2.8807194232940674
Validation loss = 3.142036199569702
Validation loss = 3.231919288635254
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 1.5726978778839111
Validation loss = 2.5202808380126953
Validation loss = 2.8910417556762695
Validation loss = 3.3206300735473633
Validation loss = 3.1840667724609375
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 1.633864164352417
Validation loss = 2.764176368713379
Validation loss = 3.28112530708313
Validation loss = 3.392228126525879
Validation loss = 3.1801514625549316
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 1.441738247871399
Validation loss = 2.5933756828308105
Validation loss = 3.0727334022521973
Validation loss = 3.137515068054199
Validation loss = 3.3168880939483643
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 303      |
| Iteration     | 1        |
| MaximumReturn | 346      |
| MinimumReturn | 246      |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 2.455737352371216
Validation loss = 2.764793634414673
Validation loss = 2.8863115310668945
Validation loss = 2.7052528858184814
Validation loss = 2.9651365280151367
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 2.396636962890625
Validation loss = 2.6265487670898438
Validation loss = 3.0371952056884766
Validation loss = 3.102846145629883
Validation loss = 3.379131317138672
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 2.449817657470703
Validation loss = 2.804415702819824
Validation loss = 2.8438990116119385
Validation loss = 2.924738883972168
Validation loss = 3.0529801845550537
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 2.3478548526763916
Validation loss = 2.8929765224456787
Validation loss = 2.9287922382354736
Validation loss = 2.8600714206695557
Validation loss = 2.8127901554107666
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 2.5477585792541504
Validation loss = 2.849728584289551
Validation loss = 3.012982130050659
Validation loss = 3.058917284011841
Validation loss = 2.932551622390747
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 412      |
| Iteration     | 2        |
| MaximumReturn | 787      |
| MinimumReturn | -882     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 3.1080338954925537
Validation loss = 3.0953197479248047
Validation loss = 2.774651050567627
Validation loss = 2.6327576637268066
Validation loss = 2.7155113220214844
Validation loss = 2.5996880531311035
Validation loss = 2.7741825580596924
Validation loss = 2.709639072418213
Validation loss = 2.691167116165161
Validation loss = 2.7123289108276367
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 3.197197914123535
Validation loss = 3.530688762664795
Validation loss = 3.4673960208892822
Validation loss = 3.3045220375061035
Validation loss = 3.471065044403076
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 3.21101713180542
Validation loss = 3.2881197929382324
Validation loss = 2.9860000610351562
Validation loss = 2.8521790504455566
Validation loss = 2.900907039642334
Validation loss = 2.9035706520080566
Validation loss = 2.8549351692199707
Validation loss = 3.158785343170166
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 2.835580825805664
Validation loss = 3.1664299964904785
Validation loss = 3.024904251098633
Validation loss = 2.9903953075408936
Validation loss = 3.012516498565674
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 3.1593055725097656
Validation loss = 3.1200995445251465
Validation loss = 2.7668700218200684
Validation loss = 2.9239888191223145
Validation loss = 2.815486431121826
Validation loss = 2.8809800148010254
Validation loss = 2.975261688232422
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 383      |
| Iteration     | 3        |
| MaximumReturn | 739      |
| MinimumReturn | -543     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12482111155986786
Validation loss = 0.058018527925014496
Validation loss = 0.05263174697756767
Validation loss = 0.05212533473968506
Validation loss = 0.04961743205785751
Validation loss = 0.05081936717033386
Validation loss = 0.04874023050069809
Validation loss = 0.04955872893333435
Validation loss = 0.048588428646326065
Validation loss = 0.046787045896053314
Validation loss = 0.051503174006938934
Validation loss = 0.04749355465173721
Validation loss = 0.05015135556459427
Validation loss = 0.047742221504449844
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1266675740480423
Validation loss = 0.06228131800889969
Validation loss = 0.05934206768870354
Validation loss = 0.05725344270467758
Validation loss = 0.05086256191134453
Validation loss = 0.05377709120512009
Validation loss = 0.049547187983989716
Validation loss = 0.05534488707780838
Validation loss = 0.050145335495471954
Validation loss = 0.050554681569337845
Validation loss = 0.049258243292570114
Validation loss = 0.047254063189029694
Validation loss = 0.04637259989976883
Validation loss = 0.04666981101036072
Validation loss = 0.04846420884132385
Validation loss = 0.04999259486794472
Validation loss = 0.04698574170470238
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12236189842224121
Validation loss = 0.06162805110216141
Validation loss = 0.05861201137304306
Validation loss = 0.05348020792007446
Validation loss = 0.05177763104438782
Validation loss = 0.051503222435712814
Validation loss = 0.05192287638783455
Validation loss = 0.05383048206567764
Validation loss = 0.055845342576503754
Validation loss = 0.04759874567389488
Validation loss = 0.04850331321358681
Validation loss = 0.05075620487332344
Validation loss = 0.04740162566304207
Validation loss = 0.04606766253709793
Validation loss = 0.0473647341132164
Validation loss = 0.07722275704145432
Validation loss = 0.04930650815367699
Validation loss = 0.04654751345515251
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11934199184179306
Validation loss = 0.05738694220781326
Validation loss = 0.05243558809161186
Validation loss = 0.05170923471450806
Validation loss = 0.04945964366197586
Validation loss = 0.04864775761961937
Validation loss = 0.052073538303375244
Validation loss = 0.047114960849285126
Validation loss = 0.046339794993400574
Validation loss = 0.04626632109284401
Validation loss = 0.04969776049256325
Validation loss = 0.048925526440143585
Validation loss = 0.047629643231630325
Validation loss = 0.050579119473695755
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12671756744384766
Validation loss = 0.059459976851940155
Validation loss = 0.05383650213479996
Validation loss = 0.05301349237561226
Validation loss = 0.05459923669695854
Validation loss = 0.049327682703733444
Validation loss = 0.04987651854753494
Validation loss = 0.04794139042496681
Validation loss = 0.050607938319444656
Validation loss = 0.048390232026576996
Validation loss = 0.04886603727936745
Validation loss = 0.04689014330506325
Validation loss = 0.04845691844820976
Validation loss = 0.04825543239712715
Validation loss = 0.05058618634939194
Validation loss = 0.050225816667079926
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 256      |
| Iteration     | 4        |
| MaximumReturn | 804      |
| MinimumReturn | -797     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05187444016337395
Validation loss = 0.03996222838759422
Validation loss = 0.03966094180941582
Validation loss = 0.04157828912138939
Validation loss = 0.040275514125823975
Validation loss = 0.04129580035805702
Validation loss = 0.03775831684470177
Validation loss = 0.03974314406514168
Validation loss = 0.03884776309132576
Validation loss = 0.03759593144059181
Validation loss = 0.03617830201983452
Validation loss = 0.03666894510388374
Validation loss = 0.03695639595389366
Validation loss = 0.037648025900125504
Validation loss = 0.03990120440721512
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.048819173127412796
Validation loss = 0.04194914922118187
Validation loss = 0.042725708335638046
Validation loss = 0.038867805153131485
Validation loss = 0.04083148017525673
Validation loss = 0.03988191857933998
Validation loss = 0.04331806302070618
Validation loss = 0.03847808763384819
Validation loss = 0.04119117185473442
Validation loss = 0.04296864941716194
Validation loss = 0.037726741284132004
Validation loss = 0.04052947834134102
Validation loss = 0.037384357303380966
Validation loss = 0.03643139824271202
Validation loss = 0.03700407221913338
Validation loss = 0.03577699884772301
Validation loss = 0.03711588680744171
Validation loss = 0.038914650678634644
Validation loss = 0.04002754017710686
Validation loss = 0.03582887724041939
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05125465616583824
Validation loss = 0.03970096632838249
Validation loss = 0.03943592682480812
Validation loss = 0.039670225232839584
Validation loss = 0.03979075327515602
Validation loss = 0.03903006389737129
Validation loss = 0.03712010756134987
Validation loss = 0.03920474648475647
Validation loss = 0.03573491796851158
Validation loss = 0.03841216489672661
Validation loss = 0.03737625107169151
Validation loss = 0.04137815162539482
Validation loss = 0.03803953900933266
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.048109378665685654
Validation loss = 0.039974067360162735
Validation loss = 0.042422402650117874
Validation loss = 0.03868328779935837
Validation loss = 0.03836771845817566
Validation loss = 0.03686217591166496
Validation loss = 0.04168480634689331
Validation loss = 0.037493590265512466
Validation loss = 0.03719576448202133
Validation loss = 0.039715539664030075
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.053675513714551926
Validation loss = 0.0394352525472641
Validation loss = 0.04184505343437195
Validation loss = 0.03937990963459015
Validation loss = 0.038789037615060806
Validation loss = 0.03780819848179817
Validation loss = 0.04016200825572014
Validation loss = 0.043330755084753036
Validation loss = 0.037889573723077774
Validation loss = 0.03942026570439339
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.49e+03 |
| Iteration     | 5        |
| MaximumReturn | 1.99e+03 |
| MinimumReturn | 293      |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.038687657564878464
Validation loss = 0.037771325558423996
Validation loss = 0.03366391733288765
Validation loss = 0.03396033123135567
Validation loss = 0.033864255994558334
Validation loss = 0.03184836730360985
Validation loss = 0.031468164175748825
Validation loss = 0.034373801201581955
Validation loss = 0.032429203391075134
Validation loss = 0.030952390283346176
Validation loss = 0.03382907062768936
Validation loss = 0.03396457061171532
Validation loss = 0.030710412189364433
Validation loss = 0.030629727989435196
Validation loss = 0.030606625601649284
Validation loss = 0.034356240183115005
Validation loss = 0.03283662721514702
Validation loss = 0.0323219858109951
Validation loss = 0.03465015068650246
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0374443419277668
Validation loss = 0.03247002512216568
Validation loss = 0.033323824405670166
Validation loss = 0.03247130289673805
Validation loss = 0.03223534673452377
Validation loss = 0.032778698951005936
Validation loss = 0.03297767415642738
Validation loss = 0.03188284486532211
Validation loss = 0.035766709595918655
Validation loss = 0.03221268579363823
Validation loss = 0.0397687591612339
Validation loss = 0.03251160681247711
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.037409279495477676
Validation loss = 0.0337994322180748
Validation loss = 0.03384885564446449
Validation loss = 0.0329049788415432
Validation loss = 0.03173534944653511
Validation loss = 0.031870175153017044
Validation loss = 0.032776132225990295
Validation loss = 0.03165726363658905
Validation loss = 0.03156644478440285
Validation loss = 0.03541720286011696
Validation loss = 0.033223800361156464
Validation loss = 0.03288440406322479
Validation loss = 0.03201952576637268
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.037665437906980515
Validation loss = 0.034516915678977966
Validation loss = 0.03356802091002464
Validation loss = 0.03259880468249321
Validation loss = 0.032780129462480545
Validation loss = 0.0362434983253479
Validation loss = 0.032727714627981186
Validation loss = 0.033198971301317215
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03690800070762634
Validation loss = 0.0345195010304451
Validation loss = 0.03638368844985962
Validation loss = 0.03276316821575165
Validation loss = 0.03355245664715767
Validation loss = 0.037837885320186615
Validation loss = 0.035354096442461014
Validation loss = 0.03282352164387703
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.96e+03 |
| Iteration     | 6        |
| MaximumReturn | 2.38e+03 |
| MinimumReturn | 900      |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03357694298028946
Validation loss = 0.02757827565073967
Validation loss = 0.027285955846309662
Validation loss = 0.028054270893335342
Validation loss = 0.03096647374331951
Validation loss = 0.02699948474764824
Validation loss = 0.026845436543226242
Validation loss = 0.026949765160679817
Validation loss = 0.026904422789812088
Validation loss = 0.027087487280368805
Validation loss = 0.027233891189098358
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03166559338569641
Validation loss = 0.029758596792817116
Validation loss = 0.028709452599287033
Validation loss = 0.028134260326623917
Validation loss = 0.02819373644888401
Validation loss = 0.02888099104166031
Validation loss = 0.029129020869731903
Validation loss = 0.02812865562736988
Validation loss = 0.02763945423066616
Validation loss = 0.027996445074677467
Validation loss = 0.027436666190624237
Validation loss = 0.029645297676324844
Validation loss = 0.027034908533096313
Validation loss = 0.02703157812356949
Validation loss = 0.02833605371415615
Validation loss = 0.027514003217220306
Validation loss = 0.0267685167491436
Validation loss = 0.028725840151309967
Validation loss = 0.02731219492852688
Validation loss = 0.02900407649576664
Validation loss = 0.026508674025535583
Validation loss = 0.027687862515449524
Validation loss = 0.029197130352258682
Validation loss = 0.028779415413737297
Validation loss = 0.027456849813461304
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03247518092393875
Validation loss = 0.02871159091591835
Validation loss = 0.027955684810876846
Validation loss = 0.02982395514845848
Validation loss = 0.028084665536880493
Validation loss = 0.029430607333779335
Validation loss = 0.02827293798327446
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.032621949911117554
Validation loss = 0.031243130564689636
Validation loss = 0.028535965830087662
Validation loss = 0.02880213037133217
Validation loss = 0.03224765136837959
Validation loss = 0.0298342015594244
Validation loss = 0.02856065146625042
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03223510459065437
Validation loss = 0.031054675579071045
Validation loss = 0.0288756862282753
Validation loss = 0.02913079410791397
Validation loss = 0.03088170289993286
Validation loss = 0.02916339412331581
Validation loss = 0.028922468423843384
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.65e+03 |
| Iteration     | 7        |
| MaximumReturn | 2.27e+03 |
| MinimumReturn | -456     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.028405848890542984
Validation loss = 0.0254167802631855
Validation loss = 0.024373676627874374
Validation loss = 0.02487161010503769
Validation loss = 0.02540801279246807
Validation loss = 0.024941762909293175
Validation loss = 0.024698786437511444
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02956608310341835
Validation loss = 0.025674132630228996
Validation loss = 0.025907620787620544
Validation loss = 0.024657122790813446
Validation loss = 0.02525850385427475
Validation loss = 0.024302752688527107
Validation loss = 0.02396472543478012
Validation loss = 0.02506203018128872
Validation loss = 0.02476627752184868
Validation loss = 0.02505720779299736
Validation loss = 0.0283847413957119
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.029377613216638565
Validation loss = 0.026196839287877083
Validation loss = 0.02726840041577816
Validation loss = 0.025525635108351707
Validation loss = 0.025194287300109863
Validation loss = 0.0248829685151577
Validation loss = 0.02875240333378315
Validation loss = 0.02647477015852928
Validation loss = 0.025578416883945465
Validation loss = 0.024660656228661537
Validation loss = 0.026194117963314056
Validation loss = 0.025398747995495796
Validation loss = 0.02401333674788475
Validation loss = 0.02464182861149311
Validation loss = 0.025072716176509857
Validation loss = 0.025759592652320862
Validation loss = 0.024394609034061432
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02936495654284954
Validation loss = 0.026120224967598915
Validation loss = 0.025952523574233055
Validation loss = 0.02856845036149025
Validation loss = 0.027187896892428398
Validation loss = 0.02567240595817566
Validation loss = 0.027660751715302467
Validation loss = 0.025741130113601685
Validation loss = 0.027012404054403305
Validation loss = 0.026949090883135796
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029768386855721474
Validation loss = 0.02770039811730385
Validation loss = 0.02803524024784565
Validation loss = 0.029799537733197212
Validation loss = 0.027659185230731964
Validation loss = 0.026279233396053314
Validation loss = 0.026953065767884254
Validation loss = 0.025767149403691292
Validation loss = 0.025147952139377594
Validation loss = 0.02531808242201805
Validation loss = 0.027607224881649017
Validation loss = 0.027444809675216675
Validation loss = 0.026362111791968346
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.33e+03 |
| Iteration     | 8        |
| MaximumReturn | 2.13e+03 |
| MinimumReturn | 176      |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026478027924895287
Validation loss = 0.02377612702548504
Validation loss = 0.023750396445393562
Validation loss = 0.0230694692581892
Validation loss = 0.022097762674093246
Validation loss = 0.02254815399646759
Validation loss = 0.022493846714496613
Validation loss = 0.0224661473184824
Validation loss = 0.023082859814167023
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02544158697128296
Validation loss = 0.024448497220873833
Validation loss = 0.023820538073778152
Validation loss = 0.022364307194948196
Validation loss = 0.022930871695280075
Validation loss = 0.02289004996418953
Validation loss = 0.02354356274008751
Validation loss = 0.023220626637339592
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027723083272576332
Validation loss = 0.0232583936303854
Validation loss = 0.0231439471244812
Validation loss = 0.02244705520570278
Validation loss = 0.024461260065436363
Validation loss = 0.022434480488300323
Validation loss = 0.022563794627785683
Validation loss = 0.022478794679045677
Validation loss = 0.02434357814490795
Validation loss = 0.023292815312743187
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028041739016771317
Validation loss = 0.023920482024550438
Validation loss = 0.024962157011032104
Validation loss = 0.025786718353629112
Validation loss = 0.02487272396683693
Validation loss = 0.02437264658510685
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.027329564094543457
Validation loss = 0.02665190026164055
Validation loss = 0.027333330363035202
Validation loss = 0.025524774566292763
Validation loss = 0.02627437375485897
Validation loss = 0.02426106669008732
Validation loss = 0.02404581382870674
Validation loss = 0.026231998577713966
Validation loss = 0.024280523881316185
Validation loss = 0.02621903456747532
Validation loss = 0.025026511400938034
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.57e+03 |
| Iteration     | 9        |
| MaximumReturn | 2.44e+03 |
| MinimumReturn | -588     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025730906054377556
Validation loss = 0.022040950134396553
Validation loss = 0.020760243758559227
Validation loss = 0.021509669721126556
Validation loss = 0.021779946982860565
Validation loss = 0.021426094695925713
Validation loss = 0.021037526428699493
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02463902160525322
Validation loss = 0.02193639986217022
Validation loss = 0.02181209623813629
Validation loss = 0.02146279066801071
Validation loss = 0.0203945841640234
Validation loss = 0.020434074103832245
Validation loss = 0.02052663080394268
Validation loss = 0.0220691729336977
Validation loss = 0.02055622823536396
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025176363065838814
Validation loss = 0.022168556228280067
Validation loss = 0.022188343107700348
Validation loss = 0.02050350233912468
Validation loss = 0.02309970185160637
Validation loss = 0.022014182060956955
Validation loss = 0.02203516848385334
Validation loss = 0.021681198850274086
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026670577004551888
Validation loss = 0.02200518362224102
Validation loss = 0.022021422162652016
Validation loss = 0.022517185658216476
Validation loss = 0.023103564977645874
Validation loss = 0.022794455289840698
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02695312537252903
Validation loss = 0.023131053894758224
Validation loss = 0.02263670414686203
Validation loss = 0.02494722045958042
Validation loss = 0.02280094474554062
Validation loss = 0.022956019267439842
Validation loss = 0.023558001965284348
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.74e+03 |
| Iteration     | 10       |
| MaximumReturn | 2.59e+03 |
| MinimumReturn | 393      |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02348991483449936
Validation loss = 0.01978236809372902
Validation loss = 0.02013831026852131
Validation loss = 0.020315291360020638
Validation loss = 0.019558170810341835
Validation loss = 0.019914932548999786
Validation loss = 0.01895904541015625
Validation loss = 0.019182465970516205
Validation loss = 0.019105711951851845
Validation loss = 0.019091352820396423
Validation loss = 0.019000258296728134
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023844899609684944
Validation loss = 0.019414162263274193
Validation loss = 0.01928810589015484
Validation loss = 0.019980037584900856
Validation loss = 0.02130621112883091
Validation loss = 0.021647803485393524
Validation loss = 0.020103422924876213
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023635094985365868
Validation loss = 0.021593235433101654
Validation loss = 0.021493440493941307
Validation loss = 0.020325710996985435
Validation loss = 0.019559187814593315
Validation loss = 0.020453965291380882
Validation loss = 0.020366059616208076
Validation loss = 0.02235916256904602
Validation loss = 0.02049262262880802
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02376973070204258
Validation loss = 0.02208784967660904
Validation loss = 0.020787587389349937
Validation loss = 0.02058369107544422
Validation loss = 0.02076728828251362
Validation loss = 0.020553989335894585
Validation loss = 0.020510205999016762
Validation loss = 0.02063378505408764
Validation loss = 0.019254550337791443
Validation loss = 0.022118985652923584
Validation loss = 0.019510967656970024
Validation loss = 0.019897660240530968
Validation loss = 0.01959741674363613
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026854505762457848
Validation loss = 0.021065259352326393
Validation loss = 0.020906364545226097
Validation loss = 0.020999198779463768
Validation loss = 0.021176880225539207
Validation loss = 0.022575028240680695
Validation loss = 0.02154776267707348
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.11e+03 |
| Iteration     | 11       |
| MaximumReturn | 2.58e+03 |
| MinimumReturn | 1.1e+03  |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02139180898666382
Validation loss = 0.018797412514686584
Validation loss = 0.0176662877202034
Validation loss = 0.018303312361240387
Validation loss = 0.01781809702515602
Validation loss = 0.01871870458126068
Validation loss = 0.01848798245191574
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02089807018637657
Validation loss = 0.01887529157102108
Validation loss = 0.018924815580248833
Validation loss = 0.018490590155124664
Validation loss = 0.018153764307498932
Validation loss = 0.018190424889326096
Validation loss = 0.01803913526237011
Validation loss = 0.017673518508672714
Validation loss = 0.018122036010026932
Validation loss = 0.017416544258594513
Validation loss = 0.019523123279213905
Validation loss = 0.0176459439098835
Validation loss = 0.01928400807082653
Validation loss = 0.017537932842969894
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022207695990800858
Validation loss = 0.020000208169221878
Validation loss = 0.019624272361397743
Validation loss = 0.01857917383313179
Validation loss = 0.019815592095255852
Validation loss = 0.018565256148576736
Validation loss = 0.01868751086294651
Validation loss = 0.019343851134181023
Validation loss = 0.018403537571430206
Validation loss = 0.018158556893467903
Validation loss = 0.018590984866023064
Validation loss = 0.017925597727298737
Validation loss = 0.018198886886239052
Validation loss = 0.018295980989933014
Validation loss = 0.017507759854197502
Validation loss = 0.018230225890874863
Validation loss = 0.016923297196626663
Validation loss = 0.019048374146223068
Validation loss = 0.01797725446522236
Validation loss = 0.017825327813625336
Validation loss = 0.017612984403967857
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02536829374730587
Validation loss = 0.020743854343891144
Validation loss = 0.019171016290783882
Validation loss = 0.019532237201929092
Validation loss = 0.019190572202205658
Validation loss = 0.020708199590444565
Validation loss = 0.01894487999379635
Validation loss = 0.018902495503425598
Validation loss = 0.0187354888767004
Validation loss = 0.01900266483426094
Validation loss = 0.018311994150280952
Validation loss = 0.01780960150063038
Validation loss = 0.020556358620524406
Validation loss = 0.017559444531798363
Validation loss = 0.019270218908786774
Validation loss = 0.017938818782567978
Validation loss = 0.017735455185174942
Validation loss = 0.019044149667024612
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022045155987143517
Validation loss = 0.01961333677172661
Validation loss = 0.01898489147424698
Validation loss = 0.020038243383169174
Validation loss = 0.019938144832849503
Validation loss = 0.01979454606771469
Validation loss = 0.019964007660746574
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.03e+03 |
| Iteration     | 12       |
| MaximumReturn | 2.59e+03 |
| MinimumReturn | 385      |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020771248266100883
Validation loss = 0.017042141407728195
Validation loss = 0.018145816400647163
Validation loss = 0.016443682834506035
Validation loss = 0.016977204009890556
Validation loss = 0.016374120488762856
Validation loss = 0.017209844663739204
Validation loss = 0.017723625525832176
Validation loss = 0.017624111846089363
Validation loss = 0.016844337806105614
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.019344495609402657
Validation loss = 0.017698006704449654
Validation loss = 0.017741521820425987
Validation loss = 0.01681145653128624
Validation loss = 0.017267517745494843
Validation loss = 0.016924340277910233
Validation loss = 0.016593297943472862
Validation loss = 0.016884706914424896
Validation loss = 0.016468273475766182
Validation loss = 0.015924150124192238
Validation loss = 0.017988622188568115
Validation loss = 0.017776211723685265
Validation loss = 0.016082419082522392
Validation loss = 0.019273877143859863
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022137463092803955
Validation loss = 0.017080778256058693
Validation loss = 0.01673003099858761
Validation loss = 0.016819341108202934
Validation loss = 0.017133161425590515
Validation loss = 0.016767945140600204
Validation loss = 0.01715192385017872
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.020689019933342934
Validation loss = 0.016912246122956276
Validation loss = 0.017886439338326454
Validation loss = 0.016680222004652023
Validation loss = 0.01742527075111866
Validation loss = 0.01737949252128601
Validation loss = 0.017672475427389145
Validation loss = 0.016794750466942787
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022260334342718124
Validation loss = 0.01982847973704338
Validation loss = 0.018466895446181297
Validation loss = 0.01918165571987629
Validation loss = 0.01865530014038086
Validation loss = 0.017901750281453133
Validation loss = 0.018895817920565605
Validation loss = 0.017400722950696945
Validation loss = 0.017956724390387535
Validation loss = 0.017454516142606735
Validation loss = 0.017435885965824127
Validation loss = 0.017958922311663628
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.85e+03 |
| Iteration     | 13       |
| MaximumReturn | 2.61e+03 |
| MinimumReturn | -580     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.018509961664676666
Validation loss = 0.016094321385025978
Validation loss = 0.016701621934771538
Validation loss = 0.01733667403459549
Validation loss = 0.015976306051015854
Validation loss = 0.016127431765198708
Validation loss = 0.015662623569369316
Validation loss = 0.016395993530750275
Validation loss = 0.01741321198642254
Validation loss = 0.015592982061207294
Validation loss = 0.015382497571408749
Validation loss = 0.015230241231620312
Validation loss = 0.015574747696518898
Validation loss = 0.015831131488084793
Validation loss = 0.015073965303599834
Validation loss = 0.0157391969114542
Validation loss = 0.015272590331733227
Validation loss = 0.014936097897589207
Validation loss = 0.015366364270448685
Validation loss = 0.016846032813191414
Validation loss = 0.014357183128595352
Validation loss = 0.015756146982312202
Validation loss = 0.014620136469602585
Validation loss = 0.015139609575271606
Validation loss = 0.015348917804658413
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018072089180350304
Validation loss = 0.015811864286661148
Validation loss = 0.016161790117621422
Validation loss = 0.01598617620766163
Validation loss = 0.016064070165157318
Validation loss = 0.01802167110145092
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01838754303753376
Validation loss = 0.015741802752017975
Validation loss = 0.01678546704351902
Validation loss = 0.01664654165506363
Validation loss = 0.015822898596525192
Validation loss = 0.01618797704577446
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01760229654610157
Validation loss = 0.016772998496890068
Validation loss = 0.016150716692209244
Validation loss = 0.01621076464653015
Validation loss = 0.01622256450355053
Validation loss = 0.015629110857844353
Validation loss = 0.015664076432585716
Validation loss = 0.01612403243780136
Validation loss = 0.017448769882321358
Validation loss = 0.0157038364559412
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021610798314213753
Validation loss = 0.016379080712795258
Validation loss = 0.01759362407028675
Validation loss = 0.017631573602557182
Validation loss = 0.01710236258804798
Validation loss = 0.017590411007404327
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.48e+03 |
| Iteration     | 14       |
| MaximumReturn | 2.61e+03 |
| MinimumReturn | 2.35e+03 |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01563076674938202
Validation loss = 0.013802309520542622
Validation loss = 0.014415278099477291
Validation loss = 0.014210388995707035
Validation loss = 0.014156696386635303
Validation loss = 0.01428004540503025
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.018445562571287155
Validation loss = 0.0154814962297678
Validation loss = 0.015959274023771286
Validation loss = 0.015589961782097816
Validation loss = 0.015215389430522919
Validation loss = 0.014736292883753777
Validation loss = 0.015718210488557816
Validation loss = 0.015021868981420994
Validation loss = 0.014587752521038055
Validation loss = 0.016125183552503586
Validation loss = 0.014782797545194626
Validation loss = 0.014196498319506645
Validation loss = 0.014676829800009727
Validation loss = 0.01402140874415636
Validation loss = 0.015421049669384956
Validation loss = 0.014103827066719532
Validation loss = 0.01371211651712656
Validation loss = 0.014917590655386448
Validation loss = 0.013868720270693302
Validation loss = 0.015533443540334702
Validation loss = 0.01454300619661808
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01759827882051468
Validation loss = 0.015348885208368301
Validation loss = 0.01588139869272709
Validation loss = 0.015062320977449417
Validation loss = 0.015326782129704952
Validation loss = 0.014686187729239464
Validation loss = 0.015183214098215103
Validation loss = 0.015500426292419434
Validation loss = 0.015444405376911163
Validation loss = 0.014811588451266289
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.017612341791391373
Validation loss = 0.01460112165659666
Validation loss = 0.014943533577024937
Validation loss = 0.016032550483942032
Validation loss = 0.01453101821243763
Validation loss = 0.016314752399921417
Validation loss = 0.01601238362491131
Validation loss = 0.01481947023421526
Validation loss = 0.014594734646379948
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018925834447145462
Validation loss = 0.016871141269803047
Validation loss = 0.01627245917916298
Validation loss = 0.016360871493816376
Validation loss = 0.01848769560456276
Validation loss = 0.017751729115843773
Validation loss = 0.016297783702611923
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.47e+03 |
| Iteration     | 15       |
| MaximumReturn | 2.65e+03 |
| MinimumReturn | 2.13e+03 |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.01631893217563629
Validation loss = 0.01429926510900259
Validation loss = 0.013277968391776085
Validation loss = 0.013451829552650452
Validation loss = 0.01304526999592781
Validation loss = 0.013882739469408989
Validation loss = 0.013295727781951427
Validation loss = 0.013663611374795437
Validation loss = 0.013262215070426464
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.015475950203835964
Validation loss = 0.014640388078987598
Validation loss = 0.013490491546690464
Validation loss = 0.014149500988423824
Validation loss = 0.013244708068668842
Validation loss = 0.013805895112454891
Validation loss = 0.014677763916552067
Validation loss = 0.012550722807645798
Validation loss = 0.013507147319614887
Validation loss = 0.014376752078533173
Validation loss = 0.01280495710670948
Validation loss = 0.013385524973273277
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.017276819795370102
Validation loss = 0.014995505101978779
Validation loss = 0.014704767614603043
Validation loss = 0.015826400369405746
Validation loss = 0.014184316620230675
Validation loss = 0.013785264454782009
Validation loss = 0.01504581794142723
Validation loss = 0.01488207932561636
Validation loss = 0.014403722248971462
Validation loss = 0.014279929921030998
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01642048917710781
Validation loss = 0.014268006198108196
Validation loss = 0.013625824823975563
Validation loss = 0.014279636554419994
Validation loss = 0.014700106345117092
Validation loss = 0.0148441968485713
Validation loss = 0.014458265155553818
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.01661514863371849
Validation loss = 0.016339287161827087
Validation loss = 0.015625908970832825
Validation loss = 0.016267331317067146
Validation loss = 0.015093817375600338
Validation loss = 0.015112477354705334
Validation loss = 0.01483719702810049
Validation loss = 0.01612776890397072
Validation loss = 0.014484157785773277
Validation loss = 0.015685580670833588
Validation loss = 0.013914781622588634
Validation loss = 0.015023459680378437
Validation loss = 0.014320142567157745
Validation loss = 0.014443157240748405
Validation loss = 0.01438321266323328
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.53e+03 |
| Iteration     | 16       |
| MaximumReturn | 2.69e+03 |
| MinimumReturn | 2.43e+03 |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.014687519520521164
Validation loss = 0.01249634474515915
Validation loss = 0.012911065481603146
Validation loss = 0.013379104435443878
Validation loss = 0.014152972027659416
Validation loss = 0.01210729405283928
Validation loss = 0.013005264103412628
Validation loss = 0.012647389434278011
Validation loss = 0.012708643451333046
Validation loss = 0.013648899272084236
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01568375714123249
Validation loss = 0.012953455559909344
Validation loss = 0.012446270324289799
Validation loss = 0.014650078490376472
Validation loss = 0.012884498573839664
Validation loss = 0.013383771292865276
Validation loss = 0.012818295508623123
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.014212972484529018
Validation loss = 0.013516759499907494
Validation loss = 0.013119334354996681
Validation loss = 0.013248912058770657
Validation loss = 0.01361885666847229
Validation loss = 0.012696481309831142
Validation loss = 0.013856730423867702
Validation loss = 0.01590891182422638
Validation loss = 0.012559791095554829
Validation loss = 0.013197578489780426
Validation loss = 0.012410681694746017
Validation loss = 0.013577006757259369
Validation loss = 0.013455178588628769
Validation loss = 0.012466970831155777
Validation loss = 0.01295884232968092
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0141486506909132
Validation loss = 0.013890738599002361
Validation loss = 0.013656994327902794
Validation loss = 0.013251235708594322
Validation loss = 0.013736234977841377
Validation loss = 0.013877619057893753
Validation loss = 0.014290162362158298
Validation loss = 0.012441787868738174
Validation loss = 0.01384025625884533
Validation loss = 0.012951357290148735
Validation loss = 0.013084320351481438
Validation loss = 0.012533258646726608
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016223467886447906
Validation loss = 0.013546901755034924
Validation loss = 0.01655346155166626
Validation loss = 0.014042962342500687
Validation loss = 0.013519511558115482
Validation loss = 0.013774598948657513
Validation loss = 0.014118792489171028
Validation loss = 0.013925312086939812
Validation loss = 0.012807868421077728
Validation loss = 0.014088339172303677
Validation loss = 0.013478065840899944
Validation loss = 0.012843148782849312
Validation loss = 0.013699454255402088
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.62e+03 |
| Iteration     | 17       |
| MaximumReturn | 2.84e+03 |
| MinimumReturn | 2.39e+03 |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.013128739781677723
Validation loss = 0.012177649885416031
Validation loss = 0.01188598107546568
Validation loss = 0.012398259714245796
Validation loss = 0.012027245946228504
Validation loss = 0.011723530478775501
Validation loss = 0.012477683834731579
Validation loss = 0.011300993151962757
Validation loss = 0.012031301856040955
Validation loss = 0.012535439804196358
Validation loss = 0.011087452992796898
Validation loss = 0.011176060885190964
Validation loss = 0.013200372457504272
Validation loss = 0.011337081901729107
Validation loss = 0.011143360286951065
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01369547564536333
Validation loss = 0.012448645196855068
Validation loss = 0.013697385787963867
Validation loss = 0.012199760414659977
Validation loss = 0.012396635487675667
Validation loss = 0.0133336391299963
Validation loss = 0.011944812722504139
Validation loss = 0.012446819804608822
Validation loss = 0.01298647653311491
Validation loss = 0.012099554762244225
Validation loss = 0.012603182345628738
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013694104738533497
Validation loss = 0.012097961269319057
Validation loss = 0.011943313293159008
Validation loss = 0.011790163815021515
Validation loss = 0.013118688948452473
Validation loss = 0.012194402515888214
Validation loss = 0.012846194207668304
Validation loss = 0.012286740355193615
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013956171460449696
Validation loss = 0.012495333328843117
Validation loss = 0.012990028597414494
Validation loss = 0.01201859675347805
Validation loss = 0.012586957775056362
Validation loss = 0.011865631677210331
Validation loss = 0.013840235769748688
Validation loss = 0.01226664986461401
Validation loss = 0.01212179847061634
Validation loss = 0.011799338273704052
Validation loss = 0.012471101246774197
Validation loss = 0.012315275147557259
Validation loss = 0.01309180911630392
Validation loss = 0.012065911665558815
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.015193037688732147
Validation loss = 0.012968585826456547
Validation loss = 0.01334987860172987
Validation loss = 0.013046897016465664
Validation loss = 0.014479905366897583
Validation loss = 0.012080401182174683
Validation loss = 0.013950682245194912
Validation loss = 0.011799936182796955
Validation loss = 0.012655354104936123
Validation loss = 0.013309510424733162
Validation loss = 0.011696971021592617
Validation loss = 0.01193990558385849
Validation loss = 0.012523605488240719
Validation loss = 0.011508583091199398
Validation loss = 0.013474511913955212
Validation loss = 0.011287371627986431
Validation loss = 0.011673319153487682
Validation loss = 0.012087681330740452
Validation loss = 0.012080468237400055
Validation loss = 0.011504474096000195
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.73e+03 |
| Iteration     | 18       |
| MaximumReturn | 2.85e+03 |
| MinimumReturn | 2.56e+03 |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.012069890275597572
Validation loss = 0.012149798683822155
Validation loss = 0.010881568305194378
Validation loss = 0.01105062011629343
Validation loss = 0.011656681075692177
Validation loss = 0.011883387342095375
Validation loss = 0.010750709101557732
Validation loss = 0.010746498592197895
Validation loss = 0.01083868183195591
Validation loss = 0.010439781472086906
Validation loss = 0.011311893351376057
Validation loss = 0.01071167178452015
Validation loss = 0.010951250791549683
Validation loss = 0.010501344688236713
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013363349251449108
Validation loss = 0.011352616362273693
Validation loss = 0.012102869339287281
Validation loss = 0.011982659809291363
Validation loss = 0.011344516649842262
Validation loss = 0.012140211649239063
Validation loss = 0.011391649022698402
Validation loss = 0.012025740928947926
Validation loss = 0.011718248948454857
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.013723820447921753
Validation loss = 0.011706904508173466
Validation loss = 0.012154554948210716
Validation loss = 0.011820892803370953
Validation loss = 0.012008455581963062
Validation loss = 0.011468409560620785
Validation loss = 0.011255420744419098
Validation loss = 0.011975192464888096
Validation loss = 0.012670534662902355
Validation loss = 0.011105548590421677
Validation loss = 0.012463703751564026
Validation loss = 0.010961554944515228
Validation loss = 0.011611363850533962
Validation loss = 0.011792811565101147
Validation loss = 0.010889354161918163
Validation loss = 0.011493319645524025
Validation loss = 0.010733049362897873
Validation loss = 0.011127592995762825
Validation loss = 0.010735922493040562
Validation loss = 0.011236818507313728
Validation loss = 0.011321988888084888
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.012722012586891651
Validation loss = 0.011089814826846123
Validation loss = 0.012383881956338882
Validation loss = 0.010965293273329735
Validation loss = 0.011389404535293579
Validation loss = 0.01210084743797779
Validation loss = 0.011488648131489754
Validation loss = 0.011787801049649715
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.014018446207046509
Validation loss = 0.011136683635413647
Validation loss = 0.011491173878312111
Validation loss = 0.011800701729953289
Validation loss = 0.011579146608710289
Validation loss = 0.012639613822102547
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.81e+03 |
| Iteration     | 19       |
| MaximumReturn | 3.03e+03 |
| MinimumReturn | 2.51e+03 |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011295334435999393
Validation loss = 0.011074579320847988
Validation loss = 0.010409526526927948
Validation loss = 0.011088162660598755
Validation loss = 0.01114334911108017
Validation loss = 0.010468322783708572
Validation loss = 0.010421463288366795
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012186781503260136
Validation loss = 0.011547703295946121
Validation loss = 0.010995372198522091
Validation loss = 0.011696791276335716
Validation loss = 0.012268777005374432
Validation loss = 0.011254704557359219
Validation loss = 0.010728226974606514
Validation loss = 0.011046014726161957
Validation loss = 0.010368228890001774
Validation loss = 0.012764405459165573
Validation loss = 0.010803651064634323
Validation loss = 0.011086547747254372
Validation loss = 0.010550010949373245
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011575586162507534
Validation loss = 0.011533806100487709
Validation loss = 0.010952189564704895
Validation loss = 0.011331244371831417
Validation loss = 0.011927629821002483
Validation loss = 0.01069775689393282
Validation loss = 0.011621945537626743
Validation loss = 0.010440154001116753
Validation loss = 0.0112786665558815
Validation loss = 0.011067907325923443
Validation loss = 0.011944129131734371
Validation loss = 0.01049953605979681
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.013078608550131321
Validation loss = 0.01098503265529871
Validation loss = 0.010720822960138321
Validation loss = 0.010853986255824566
Validation loss = 0.011330335400998592
Validation loss = 0.011318330653011799
Validation loss = 0.010179398581385612
Validation loss = 0.011074146255850792
Validation loss = 0.010442529805004597
Validation loss = 0.010735111311078072
Validation loss = 0.01081549096852541
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012525352649390697
Validation loss = 0.01106641348451376
Validation loss = 0.01138368621468544
Validation loss = 0.010942433960735798
Validation loss = 0.010951350443065166
Validation loss = 0.010920803062617779
Validation loss = 0.01141618937253952
Validation loss = 0.010689427144825459
Validation loss = 0.012264324352145195
Validation loss = 0.010823013260960579
Validation loss = 0.010710664093494415
Validation loss = 0.011279301717877388
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.78e+03 |
| Iteration     | 20       |
| MaximumReturn | 3.08e+03 |
| MinimumReturn | 2.54e+03 |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0118454834446311
Validation loss = 0.010692273266613483
Validation loss = 0.010311027988791466
Validation loss = 0.010224677622318268
Validation loss = 0.010543354786932468
Validation loss = 0.010712333023548126
Validation loss = 0.009801170788705349
Validation loss = 0.010801115073263645
Validation loss = 0.009390524588525295
Validation loss = 0.0100283557549119
Validation loss = 0.010855005122721195
Validation loss = 0.010235920548439026
Validation loss = 0.010817662812769413
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011341851204633713
Validation loss = 0.011086075566709042
Validation loss = 0.010348686017096043
Validation loss = 0.011776003986597061
Validation loss = 0.010153678245842457
Validation loss = 0.010689693503081799
Validation loss = 0.010822116397321224
Validation loss = 0.010310054756700993
Validation loss = 0.010346564464271069
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011625380255281925
Validation loss = 0.01051007118076086
Validation loss = 0.01113028172403574
Validation loss = 0.011273927055299282
Validation loss = 0.010235174559056759
Validation loss = 0.010664709843695164
Validation loss = 0.01025925949215889
Validation loss = 0.011188461445271969
Validation loss = 0.010334785096347332
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011746516451239586
Validation loss = 0.010482280515134335
Validation loss = 0.010508359409868717
Validation loss = 0.010510053485631943
Validation loss = 0.011512561701238155
Validation loss = 0.010061508044600487
Validation loss = 0.01039917953312397
Validation loss = 0.011232024990022182
Validation loss = 0.009754999540746212
Validation loss = 0.010244288481771946
Validation loss = 0.01008600927889347
Validation loss = 0.01026744395494461
Validation loss = 0.009851002134382725
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.012136761099100113
Validation loss = 0.010340430773794651
Validation loss = 0.011072603985667229
Validation loss = 0.010202530771493912
Validation loss = 0.010886679403483868
Validation loss = 0.010357133112847805
Validation loss = 0.010777134448289871
Validation loss = 0.010267493315041065
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.14e+03 |
| Iteration     | 21       |
| MaximumReturn | 2.92e+03 |
| MinimumReturn | -340     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.011463313363492489
Validation loss = 0.010169153101742268
Validation loss = 0.011081818491220474
Validation loss = 0.009792215190827847
Validation loss = 0.009922551922500134
Validation loss = 0.00979543849825859
Validation loss = 0.00991765595972538
Validation loss = 0.009779614396393299
Validation loss = 0.010582329705357552
Validation loss = 0.009790823794901371
Validation loss = 0.010564741678535938
Validation loss = 0.009818938560783863
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012301918119192123
Validation loss = 0.010574433021247387
Validation loss = 0.010110069997608662
Validation loss = 0.010728111490607262
Validation loss = 0.010297664441168308
Validation loss = 0.010367968119680882
Validation loss = 0.01010147575289011
Validation loss = 0.010779784992337227
Validation loss = 0.010834797285497189
Validation loss = 0.009966600686311722
Validation loss = 0.010268962942063808
Validation loss = 0.010338560678064823
Validation loss = 0.010303947143256664
Validation loss = 0.010525469668209553
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011580642312765121
Validation loss = 0.011365924030542374
Validation loss = 0.011470936238765717
Validation loss = 0.011710298247635365
Validation loss = 0.010073295794427395
Validation loss = 0.011064308695495129
Validation loss = 0.010668409988284111
Validation loss = 0.011250300332903862
Validation loss = 0.010092477314174175
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.01198663655668497
Validation loss = 0.010248864069581032
Validation loss = 0.01040901429951191
Validation loss = 0.010535952635109425
Validation loss = 0.009818149730563164
Validation loss = 0.010717925615608692
Validation loss = 0.009796730242669582
Validation loss = 0.009749449789524078
Validation loss = 0.010576213710010052
Validation loss = 0.010130279697477818
Validation loss = 0.010216256603598595
Validation loss = 0.010874333791434765
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.011571206152439117
Validation loss = 0.011390537954866886
Validation loss = 0.010238880291581154
Validation loss = 0.010212201625108719
Validation loss = 0.010960930958390236
Validation loss = 0.011293423362076283
Validation loss = 0.010534104891121387
Validation loss = 0.010449460707604885
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.76e+03 |
| Iteration     | 22       |
| MaximumReturn | 2.87e+03 |
| MinimumReturn | 2.58e+03 |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010660265572369099
Validation loss = 0.009763069450855255
Validation loss = 0.009980895556509495
Validation loss = 0.009376459755003452
Validation loss = 0.009249352850019932
Validation loss = 0.00999090913683176
Validation loss = 0.00976614560931921
Validation loss = 0.009561396203935146
Validation loss = 0.01001609954982996
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.01206894963979721
Validation loss = 0.009584031067788601
Validation loss = 0.009795255027711391
Validation loss = 0.009824596345424652
Validation loss = 0.009866759181022644
Validation loss = 0.010293618775904179
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.011560889892280102
Validation loss = 0.010324922390282154
Validation loss = 0.010587473399937153
Validation loss = 0.011180768720805645
Validation loss = 0.010126490145921707
Validation loss = 0.010046429000794888
Validation loss = 0.010652597062289715
Validation loss = 0.00974863301962614
Validation loss = 0.009653972461819649
Validation loss = 0.009958750568330288
Validation loss = 0.010307387448847294
Validation loss = 0.009582259692251682
Validation loss = 0.010465122759342194
Validation loss = 0.009184290654957294
Validation loss = 0.011185944080352783
Validation loss = 0.009088020771741867
Validation loss = 0.010865996591746807
Validation loss = 0.009852658025920391
Validation loss = 0.010149731300771236
Validation loss = 0.010685590095818043
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010725903324782848
Validation loss = 0.010158414021134377
Validation loss = 0.010425279848277569
Validation loss = 0.01056760549545288
Validation loss = 0.00977108534425497
Validation loss = 0.009897439740598202
Validation loss = 0.009925155900418758
Validation loss = 0.009513928554952145
Validation loss = 0.00946006178855896
Validation loss = 0.0094379847869277
Validation loss = 0.009593109600245953
Validation loss = 0.009674676693975925
Validation loss = 0.010661914944648743
Validation loss = 0.009883779101073742
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010894251056015491
Validation loss = 0.011234335601329803
Validation loss = 0.009662691503763199
Validation loss = 0.010403831489384174
Validation loss = 0.01021048054099083
Validation loss = 0.01074680220335722
Validation loss = 0.00994233600795269
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.97e+03 |
| Iteration     | 23       |
| MaximumReturn | 3.28e+03 |
| MinimumReturn | 2.84e+03 |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.00993990059942007
Validation loss = 0.009776363149285316
Validation loss = 0.008967325091362
Validation loss = 0.009193219244480133
Validation loss = 0.009044190868735313
Validation loss = 0.008824779652059078
Validation loss = 0.009370223619043827
Validation loss = 0.00898818951100111
Validation loss = 0.009151021018624306
Validation loss = 0.009315676987171173
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011170297861099243
Validation loss = 0.00979350320994854
Validation loss = 0.010490945540368557
Validation loss = 0.010417184792459011
Validation loss = 0.009108605794608593
Validation loss = 0.009399833157658577
Validation loss = 0.009647661820054054
Validation loss = 0.0097934789955616
Validation loss = 0.009387334808707237
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01102426927536726
Validation loss = 0.009501473978161812
Validation loss = 0.010112864896655083
Validation loss = 0.00954591017216444
Validation loss = 0.009717503562569618
Validation loss = 0.009155096486210823
Validation loss = 0.0112059460952878
Validation loss = 0.009309578686952591
Validation loss = 0.00966144260019064
Validation loss = 0.009412570856511593
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009985136799514294
Validation loss = 0.00889032706618309
Validation loss = 0.009972180239856243
Validation loss = 0.00974947027862072
Validation loss = 0.009375175461173058
Validation loss = 0.01043822430074215
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010779903270304203
Validation loss = 0.009807958267629147
Validation loss = 0.010451817885041237
Validation loss = 0.009762604720890522
Validation loss = 0.010072355158627033
Validation loss = 0.009397933259606361
Validation loss = 0.009840154089033604
Validation loss = 0.010264408774673939
Validation loss = 0.009900647215545177
Validation loss = 0.009607159532606602
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.99e+03 |
| Iteration     | 24       |
| MaximumReturn | 3.45e+03 |
| MinimumReturn | -349     |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010201739147305489
Validation loss = 0.009608604945242405
Validation loss = 0.008991644717752934
Validation loss = 0.009630129672586918
Validation loss = 0.00984863005578518
Validation loss = 0.009154466912150383
Validation loss = 0.00934676919132471
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013067012652754784
Validation loss = 0.00940629467368126
Validation loss = 0.010626289993524551
Validation loss = 0.009006084874272346
Validation loss = 0.009312125854194164
Validation loss = 0.009238838218152523
Validation loss = 0.009486949071288109
Validation loss = 0.009211060591042042
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010839579626917839
Validation loss = 0.00951592531055212
Validation loss = 0.010181782767176628
Validation loss = 0.0096334433183074
Validation loss = 0.008953279815614223
Validation loss = 0.009723787195980549
Validation loss = 0.009453099220991135
Validation loss = 0.00946671050041914
Validation loss = 0.009622547775506973
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.011520645581185818
Validation loss = 0.009260187856853008
Validation loss = 0.009078756906092167
Validation loss = 0.009382552467286587
Validation loss = 0.009189603850245476
Validation loss = 0.01012803427875042
Validation loss = 0.008853685110807419
Validation loss = 0.009842685423791409
Validation loss = 0.009074818342924118
Validation loss = 0.009288488887250423
Validation loss = 0.009058541618287563
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010923221707344055
Validation loss = 0.010024347342550755
Validation loss = 0.009433874860405922
Validation loss = 0.010380653664469719
Validation loss = 0.00953815970569849
Validation loss = 0.009272993542253971
Validation loss = 0.009899250231683254
Validation loss = 0.00968134868890047
Validation loss = 0.009576917625963688
Validation loss = 0.00968955922871828
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.51e+03 |
| Iteration     | 25       |
| MaximumReturn | 3.28e+03 |
| MinimumReturn | 79.7     |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010089272633194923
Validation loss = 0.008845943957567215
Validation loss = 0.008764213882386684
Validation loss = 0.00936539750546217
Validation loss = 0.00866128969937563
Validation loss = 0.009193353354930878
Validation loss = 0.008613907732069492
Validation loss = 0.009473972022533417
Validation loss = 0.008902627043426037
Validation loss = 0.009359325282275677
Validation loss = 0.008856263943016529
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009855007752776146
Validation loss = 0.008984589017927647
Validation loss = 0.009956621564924717
Validation loss = 0.008737464435398579
Validation loss = 0.009881453588604927
Validation loss = 0.008916936814785004
Validation loss = 0.009615236893296242
Validation loss = 0.009334820322692394
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010007886216044426
Validation loss = 0.008889143355190754
Validation loss = 0.009567040018737316
Validation loss = 0.008956349454820156
Validation loss = 0.00940742064267397
Validation loss = 0.008770392276346684
Validation loss = 0.009374185465276241
Validation loss = 0.009005087427794933
Validation loss = 0.008348068222403526
Validation loss = 0.009287549182772636
Validation loss = 0.008890943601727486
Validation loss = 0.008677772246301174
Validation loss = 0.009518955834209919
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.009906958788633347
Validation loss = 0.010022259317338467
Validation loss = 0.008485481142997742
Validation loss = 0.009597049094736576
Validation loss = 0.009534206241369247
Validation loss = 0.008577278815209866
Validation loss = 0.00968309585005045
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009857403114438057
Validation loss = 0.009565673768520355
Validation loss = 0.009802782908082008
Validation loss = 0.009506499394774437
Validation loss = 0.00980312004685402
Validation loss = 0.010556912049651146
Validation loss = 0.00917908363044262
Validation loss = 0.009518581442534924
Validation loss = 0.009663848206400871
Validation loss = 0.009500161744654179
Validation loss = 0.009194357320666313
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.64e+03 |
| Iteration     | 26       |
| MaximumReturn | 3.25e+03 |
| MinimumReturn | 967      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.010323471389710903
Validation loss = 0.008643652312457561
Validation loss = 0.009625100530683994
Validation loss = 0.008745446801185608
Validation loss = 0.008652439340949059
Validation loss = 0.009250393137335777
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009937256574630737
Validation loss = 0.008810288272798061
Validation loss = 0.00907591450959444
Validation loss = 0.00861559808254242
Validation loss = 0.00936733279377222
Validation loss = 0.008981971070170403
Validation loss = 0.00856939610093832
Validation loss = 0.009092532098293304
Validation loss = 0.008890359662473202
Validation loss = 0.00885783601552248
Validation loss = 0.009009071625769138
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.010251308791339397
Validation loss = 0.008651038631796837
Validation loss = 0.008948171511292458
Validation loss = 0.009365974925458431
Validation loss = 0.010044035501778126
Validation loss = 0.008736386895179749
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.00967011321336031
Validation loss = 0.00936884991824627
Validation loss = 0.00949015747755766
Validation loss = 0.008816204033792019
Validation loss = 0.008935494348406792
Validation loss = 0.008739315904676914
Validation loss = 0.008764982223510742
Validation loss = 0.009324624203145504
Validation loss = 0.00849791057407856
Validation loss = 0.009359648451209068
Validation loss = 0.008350173942744732
Validation loss = 0.008786964230239391
Validation loss = 0.009052121080458164
Validation loss = 0.008253076113760471
Validation loss = 0.008549819700419903
Validation loss = 0.008706306107342243
Validation loss = 0.008306694217026234
Validation loss = 0.00880655087530613
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.010478082112967968
Validation loss = 0.008736914955079556
Validation loss = 0.009089836850762367
Validation loss = 0.010014861822128296
Validation loss = 0.008946994319558144
Validation loss = 0.009452818892896175
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.57e+03 |
| Iteration     | 27       |
| MaximumReturn | 3.08e+03 |
| MinimumReturn | 508      |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.008966946043074131
Validation loss = 0.00873249489814043
Validation loss = 0.00929936021566391
Validation loss = 0.008517662063241005
Validation loss = 0.008404945023357868
Validation loss = 0.008196844719350338
Validation loss = 0.009812581352889538
Validation loss = 0.008840550668537617
Validation loss = 0.007997850887477398
Validation loss = 0.008399094454944134
Validation loss = 0.008031892590224743
Validation loss = 0.008328908123075962
Validation loss = 0.008400899358093739
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.009747849777340889
Validation loss = 0.008473195135593414
Validation loss = 0.008609971031546593
Validation loss = 0.008422153070569038
Validation loss = 0.00846159178763628
Validation loss = 0.00904635339975357
Validation loss = 0.008225644007325172
Validation loss = 0.008607531897723675
Validation loss = 0.009260564111173153
Validation loss = 0.00854612234979868
Validation loss = 0.008680609986186028
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01033692341297865
Validation loss = 0.008812970481812954
Validation loss = 0.008832224644720554
Validation loss = 0.008113410323858261
Validation loss = 0.008560948073863983
Validation loss = 0.0089193070307374
Validation loss = 0.008812315762043
Validation loss = 0.008085234090685844
Validation loss = 0.009209776297211647
Validation loss = 0.008230619132518768
Validation loss = 0.008278700523078442
Validation loss = 0.007841823622584343
Validation loss = 0.008565331809222698
Validation loss = 0.008101201616227627
Validation loss = 0.009574256837368011
Validation loss = 0.008281064219772816
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.010243638418614864
Validation loss = 0.008677021600306034
Validation loss = 0.008396643213927746
Validation loss = 0.008304416202008724
Validation loss = 0.008949204348027706
Validation loss = 0.008108838461339474
Validation loss = 0.008148935623466969
Validation loss = 0.008182448334991932
Validation loss = 0.008533250540494919
Validation loss = 0.008505536243319511
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.009375589899718761
Validation loss = 0.008535658940672874
Validation loss = 0.009399395436048508
Validation loss = 0.00842352956533432
Validation loss = 0.008781868033111095
Validation loss = 0.009070075117051601
Validation loss = 0.009435041807591915
Validation loss = 0.008604681119322777
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.22e+03 |
| Iteration     | 28       |
| MaximumReturn | 3.11e+03 |
| MinimumReturn | 128      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022747088223695755
Validation loss = 0.020304905250668526
Validation loss = 0.023648297414183617
Validation loss = 0.02061419002711773
Validation loss = 0.01770370826125145
Validation loss = 0.021499328315258026
Validation loss = 0.023759452626109123
Validation loss = 0.020447801798582077
Validation loss = 0.024558175355196
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.013313105329871178
Validation loss = 0.014532282017171383
Validation loss = 0.013504590839147568
Validation loss = 0.012442334555089474
Validation loss = 0.012203289195895195
Validation loss = 0.011687684804201126
Validation loss = 0.012723640538752079
Validation loss = 0.011222370900213718
Validation loss = 0.012318193912506104
Validation loss = 0.017679128795862198
Validation loss = 0.013956946320831776
Validation loss = 0.013175019063055515
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.019437145441770554
Validation loss = 0.02410520240664482
Validation loss = 0.02534213475883007
Validation loss = 0.02000254951417446
Validation loss = 0.02831318788230419
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.021576324477791786
Validation loss = 0.02526860684156418
Validation loss = 0.03108050487935543
Validation loss = 0.022538069635629654
Validation loss = 0.022471928969025612
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.016449328511953354
Validation loss = 0.013548355549573898
Validation loss = 0.016114355996251106
Validation loss = 0.01849035732448101
Validation loss = 0.01821756176650524
Validation loss = 0.016708306968212128
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.46e+03 |
| Iteration     | 29       |
| MaximumReturn | 3.06e+03 |
| MinimumReturn | 303      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020126476883888245
Validation loss = 0.02135188691318035
Validation loss = 0.027863619849085808
Validation loss = 0.017649443820118904
Validation loss = 0.02603377215564251
Validation loss = 0.02518138661980629
Validation loss = 0.030540576204657555
Validation loss = 0.02826201729476452
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.012415525503456593
Validation loss = 0.013028619810938835
Validation loss = 0.015327510423958302
Validation loss = 0.014364739879965782
Validation loss = 0.01586376316845417
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0242965966463089
Validation loss = 0.016834424808621407
Validation loss = 0.025340963155031204
Validation loss = 0.025722648948431015
Validation loss = 0.021366672590374947
Validation loss = 0.020282573997974396
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.027060316875576973
Validation loss = 0.03113468363881111
Validation loss = 0.025072310119867325
Validation loss = 0.03325808048248291
Validation loss = 0.027259711176156998
Validation loss = 0.03537258878350258
Validation loss = 0.022712649777531624
Validation loss = 0.028243515640497208
Validation loss = 0.03379659727215767
Validation loss = 0.03449774533510208
Validation loss = 0.023760424926877022
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.018105126917362213
Validation loss = 0.017072582617402077
Validation loss = 0.019892381504178047
Validation loss = 0.021770810708403587
Validation loss = 0.02599237486720085
Validation loss = 0.02705853246152401
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.67e+03 |
| Iteration     | 30       |
| MaximumReturn | 3.28e+03 |
| MinimumReturn | 374      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023089896887540817
Validation loss = 0.026410743594169617
Validation loss = 0.02368725836277008
Validation loss = 0.018322942778468132
Validation loss = 0.02641390822827816
Validation loss = 0.027822550386190414
Validation loss = 0.02562425285577774
Validation loss = 0.033992063254117966
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.014621002599596977
Validation loss = 0.012875977903604507
Validation loss = 0.011666452512145042
Validation loss = 0.011356406845152378
Validation loss = 0.013902970589697361
Validation loss = 0.013148406520485878
Validation loss = 0.014635071158409119
Validation loss = 0.011962322518229485
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.027840537950396538
Validation loss = 0.018149901181459427
Validation loss = 0.021391889080405235
Validation loss = 0.026399703696370125
Validation loss = 0.01980983279645443
Validation loss = 0.024944763630628586
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03249312937259674
Validation loss = 0.02885843813419342
Validation loss = 0.032523974776268005
Validation loss = 0.03056969866156578
Validation loss = 0.02793538011610508
Validation loss = 0.028415344655513763
Validation loss = 0.03819520026445389
Validation loss = 0.0231893602758646
Validation loss = 0.03575706481933594
Validation loss = 0.024905815720558167
Validation loss = 0.031475409865379333
Validation loss = 0.03413590043783188
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.02510124444961548
Validation loss = 0.018695443868637085
Validation loss = 0.02207539603114128
Validation loss = 0.022226743400096893
Validation loss = 0.02262411266565323
Validation loss = 0.022292938083410263
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 3.14e+03 |
| Iteration     | 31       |
| MaximumReturn | 3.28e+03 |
| MinimumReturn | 2.94e+03 |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025229468941688538
Validation loss = 0.03151661157608032
Validation loss = 0.03291422501206398
Validation loss = 0.027605906128883362
Validation loss = 0.030214639380574226
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.011970075778663158
Validation loss = 0.012440248392522335
Validation loss = 0.018554534763097763
Validation loss = 0.012853680178523064
Validation loss = 0.01369602233171463
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.01891101710498333
Validation loss = 0.021535998210310936
Validation loss = 0.017581969499588013
Validation loss = 0.016911227256059647
Validation loss = 0.02378956973552704
Validation loss = 0.0204891636967659
Validation loss = 0.014043312519788742
Validation loss = 0.017046421766281128
Validation loss = 0.020926717668771744
Validation loss = 0.023138832300901413
Validation loss = 0.015825336799025536
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0371815487742424
Validation loss = 0.02388002909719944
Validation loss = 0.03614402562379837
Validation loss = 0.02921288087964058
Validation loss = 0.02688182331621647
Validation loss = 0.027148833498358727
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.024907976388931274
Validation loss = 0.02172638662159443
Validation loss = 0.0272554662078619
Validation loss = 0.022356435656547546
Validation loss = 0.02255111001431942
Validation loss = 0.020609190687537193
Validation loss = 0.017969924956560135
Validation loss = 0.019259070977568626
Validation loss = 0.02335747331380844
Validation loss = 0.019724424928426743
Validation loss = 0.02615910768508911
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.79e+03 |
| Iteration     | 32       |
| MaximumReturn | 3.3e+03  |
| MinimumReturn | 1.66e+03 |
| TotalSamples  | 136000   |
----------------------------
