Logging to experiments/hopper/hopper/Sun-23-Oct-2022-10-30-55-AM-CDT_hopper_trpo_iteration_20_seed1234
Print configuration .....
{'env_name': 'hopper', 'random_seeds': [1234, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/hopper_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.8027303218841553
Validation loss = 0.30072021484375
Validation loss = 0.3049471378326416
Validation loss = 0.30352169275283813
Validation loss = 0.3109149634838104
Validation loss = 0.32507258653640747
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.6995431184768677
Validation loss = 0.3007148504257202
Validation loss = 0.29897475242614746
Validation loss = 0.29968881607055664
Validation loss = 0.3239032030105591
Validation loss = 0.3237229585647583
Validation loss = 0.3310917913913727
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.8177937865257263
Validation loss = 0.29923760890960693
Validation loss = 0.2950664758682251
Validation loss = 0.31041276454925537
Validation loss = 0.31204673647880554
Validation loss = 0.34399479627609253
Validation loss = 0.3421291708946228
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7605311870574951
Validation loss = 0.32153505086898804
Validation loss = 0.3119345009326935
Validation loss = 0.3074498176574707
Validation loss = 0.32249724864959717
Validation loss = 0.3236621618270874
Validation loss = 0.3342529535293579
Validation loss = 0.34195372462272644
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6364749670028687
Validation loss = 0.3183964192867279
Validation loss = 0.3061150312423706
Validation loss = 0.3034195601940155
Validation loss = 0.31925642490386963
Validation loss = 0.31430912017822266
Validation loss = 0.3343562185764313
Validation loss = 0.326860249042511
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.24e+03 |
| Iteration     | 0         |
| MaximumReturn | -2.13e+03 |
| MinimumReturn | -2.31e+03 |
| TotalSamples  | 8000      |
-----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2791701853275299
Validation loss = 0.2196570783853531
Validation loss = 0.2239830195903778
Validation loss = 0.21544519066810608
Validation loss = 0.21783946454524994
Validation loss = 0.2303951382637024
Validation loss = 0.22296789288520813
Validation loss = 0.23537884652614594
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2555958330631256
Validation loss = 0.22653202712535858
Validation loss = 0.2282605767250061
Validation loss = 0.21974508464336395
Validation loss = 0.22969608008861542
Validation loss = 0.220361590385437
Validation loss = 0.2280036360025406
Validation loss = 0.23570972681045532
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.27284228801727295
Validation loss = 0.22281783819198608
Validation loss = 0.22965066134929657
Validation loss = 0.22159051895141602
Validation loss = 0.22659964859485626
Validation loss = 0.23740661144256592
Validation loss = 0.22736170887947083
Validation loss = 0.23182238638401031
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.25364965200424194
Validation loss = 0.2312818169593811
Validation loss = 0.2274167835712433
Validation loss = 0.23289406299591064
Validation loss = 0.24266046285629272
Validation loss = 0.229041188955307
Validation loss = 0.23324936628341675
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2524303197860718
Validation loss = 0.2267196625471115
Validation loss = 0.2191813737154007
Validation loss = 0.22364966571331024
Validation loss = 0.22122979164123535
Validation loss = 0.2270371913909912
Validation loss = 0.22501705586910248
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.22e+03 |
| Iteration     | 1         |
| MaximumReturn | -195      |
| MinimumReturn | -2.19e+03 |
| TotalSamples  | 12000     |
-----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.29060834646224976
Validation loss = 0.262153685092926
Validation loss = 0.26789963245391846
Validation loss = 0.2698301374912262
Validation loss = 0.29250019788742065
Validation loss = 0.2887437641620636
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2886594533920288
Validation loss = 0.26284825801849365
Validation loss = 0.2702206075191498
Validation loss = 0.27605071663856506
Validation loss = 0.27351781725883484
Validation loss = 0.29435399174690247
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2827434241771698
Validation loss = 0.2673357427120209
Validation loss = 0.2769310772418976
Validation loss = 0.2933494746685028
Validation loss = 0.28604909777641296
Validation loss = 0.29399463534355164
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.28570714592933655
Validation loss = 0.27269360423088074
Validation loss = 0.27255237102508545
Validation loss = 0.27367690205574036
Validation loss = 0.30284544825553894
Validation loss = 0.28314682841300964
Validation loss = 0.30117207765579224
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2800053060054779
Validation loss = 0.270304411649704
Validation loss = 0.28508079051971436
Validation loss = 0.29796040058135986
Validation loss = 0.31357046961784363
Validation loss = 0.2981989085674286
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.11e+03 |
| Iteration     | 2         |
| MaximumReturn | -371      |
| MinimumReturn | -2.35e+03 |
| TotalSamples  | 16000     |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2759513556957245
Validation loss = 0.2703436315059662
Validation loss = 0.2693462371826172
Validation loss = 0.26387113332748413
Validation loss = 0.27664077281951904
Validation loss = 0.2683933973312378
Validation loss = 0.2770891785621643
Validation loss = 0.2657758295536041
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.28247007727622986
Validation loss = 0.25824174284935
Validation loss = 0.2643786370754242
Validation loss = 0.25748375058174133
Validation loss = 0.25880271196365356
Validation loss = 0.2511690855026245
Validation loss = 0.260225385427475
Validation loss = 0.2611027956008911
Validation loss = 0.25595468282699585
Validation loss = 0.2603464126586914
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.27852731943130493
Validation loss = 0.2796189785003662
Validation loss = 0.2765081226825714
Validation loss = 0.29028695821762085
Validation loss = 0.2757423520088196
Validation loss = 0.27885666489601135
Validation loss = 0.28354158997535706
Validation loss = 0.2878998816013336
Validation loss = 0.2696928381919861
Validation loss = 0.2747502624988556
Validation loss = 0.2776006758213043
Validation loss = 0.2725204825401306
Validation loss = 0.2841002345085144
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.29263952374458313
Validation loss = 0.27192422747612
Validation loss = 0.26633551716804504
Validation loss = 0.27248817682266235
Validation loss = 0.2837505340576172
Validation loss = 0.28299480676651
Validation loss = 0.2729048728942871
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.28100520372390747
Validation loss = 0.2738872766494751
Validation loss = 0.266038179397583
Validation loss = 0.2849023938179016
Validation loss = 0.28740283846855164
Validation loss = 0.2684994637966156
Validation loss = 0.277515172958374
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.07e+03 |
| Iteration     | 3         |
| MaximumReturn | -179      |
| MinimumReturn | -2.29e+03 |
| TotalSamples  | 20000     |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3124380111694336
Validation loss = 0.29475387930870056
Validation loss = 0.28531745076179504
Validation loss = 0.2832329273223877
Validation loss = 0.2702435851097107
Validation loss = 0.27467626333236694
Validation loss = 0.27734559774398804
Validation loss = 0.2675662934780121
Validation loss = 0.26840102672576904
Validation loss = 0.2729138135910034
Validation loss = 0.26392531394958496
Validation loss = 0.2697691321372986
Validation loss = 0.26716554164886475
Validation loss = 0.2708471417427063
Validation loss = 0.27146267890930176
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.31677812337875366
Validation loss = 0.29207196831703186
Validation loss = 0.3021503984928131
Validation loss = 0.2760036587715149
Validation loss = 0.2717203199863434
Validation loss = 0.2711225152015686
Validation loss = 0.2685154676437378
Validation loss = 0.27046579122543335
Validation loss = 0.2654350697994232
Validation loss = 0.2644328474998474
Validation loss = 0.27172237634658813
Validation loss = 0.26843971014022827
Validation loss = 0.2701416313648224
Validation loss = 0.2743408679962158
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.33634209632873535
Validation loss = 0.29344046115875244
Validation loss = 0.28146371245384216
Validation loss = 0.2824839651584625
Validation loss = 0.2702183127403259
Validation loss = 0.2635575234889984
Validation loss = 0.267040491104126
Validation loss = 0.2667839527130127
Validation loss = 0.26747578382492065
Validation loss = 0.26655951142311096
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.30024415254592896
Validation loss = 0.2944421172142029
Validation loss = 0.281038373708725
Validation loss = 0.2751839756965637
Validation loss = 0.2765868604183197
Validation loss = 0.27577951550483704
Validation loss = 0.27694669365882874
Validation loss = 0.2748507857322693
Validation loss = 0.26985234022140503
Validation loss = 0.2709713578224182
Validation loss = 0.2747720181941986
Validation loss = 0.274042010307312
Validation loss = 0.26576992869377136
Validation loss = 0.26831457018852234
Validation loss = 0.26544517278671265
Validation loss = 0.266522079706192
Validation loss = 0.2784939706325531
Validation loss = 0.2774060368537903
Validation loss = 0.2695322036743164
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3199393153190613
Validation loss = 0.29649752378463745
Validation loss = 0.2795488238334656
Validation loss = 0.2755216658115387
Validation loss = 0.2790459096431732
Validation loss = 0.2685125470161438
Validation loss = 0.2677268981933594
Validation loss = 0.2733859419822693
Validation loss = 0.27565181255340576
Validation loss = 0.27074238657951355
Validation loss = 0.2724629044532776
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.41e+03 |
| Iteration     | 4         |
| MaximumReturn | -143      |
| MinimumReturn | -2.62e+03 |
| TotalSamples  | 24000     |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3257163465023041
Validation loss = 0.29601287841796875
Validation loss = 0.28258422017097473
Validation loss = 0.2841852307319641
Validation loss = 0.2933700382709503
Validation loss = 0.27631404995918274
Validation loss = 0.28294849395751953
Validation loss = 0.2717086970806122
Validation loss = 0.279106467962265
Validation loss = 0.27654796838760376
Validation loss = 0.27776041626930237
Validation loss = 0.27487021684646606
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3138297498226166
Validation loss = 0.2933594286441803
Validation loss = 0.28213217854499817
Validation loss = 0.27098706364631653
Validation loss = 0.26950812339782715
Validation loss = 0.27355578541755676
Validation loss = 0.2649689018726349
Validation loss = 0.27454134821891785
Validation loss = 0.2627120018005371
Validation loss = 0.26824405789375305
Validation loss = 0.2692916989326477
Validation loss = 0.2718370854854584
Validation loss = 0.27618345618247986
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.31961777806282043
Validation loss = 0.3028010427951813
Validation loss = 0.28836145997047424
Validation loss = 0.28989407420158386
Validation loss = 0.28696659207344055
Validation loss = 0.2824009954929352
Validation loss = 0.2857690751552582
Validation loss = 0.27697253227233887
Validation loss = 0.2826903164386749
Validation loss = 0.28347378969192505
Validation loss = 0.2750575542449951
Validation loss = 0.2774676978588104
Validation loss = 0.2708713412284851
Validation loss = 0.2767534554004669
Validation loss = 0.27728161215782166
Validation loss = 0.3285176455974579
Validation loss = 0.26735785603523254
Validation loss = 0.26516827940940857
Validation loss = 0.27092164754867554
Validation loss = 0.2625795304775238
Validation loss = 0.271440714597702
Validation loss = 0.29090866446495056
Validation loss = 0.2715812623500824
Validation loss = 0.2624310553073883
Validation loss = 0.25838637351989746
Validation loss = 0.2622867822647095
Validation loss = 0.27012839913368225
Validation loss = 0.2801840305328369
Validation loss = 0.2711072266101837
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.32717373967170715
Validation loss = 0.30687952041625977
Validation loss = 0.29561594128608704
Validation loss = 0.2969202995300293
Validation loss = 0.2818007171154022
Validation loss = 0.29138249158859253
Validation loss = 0.29033613204956055
Validation loss = 0.28923681378364563
Validation loss = 0.2925552427768707
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.29667070508003235
Validation loss = 0.28290918469429016
Validation loss = 0.28872835636138916
Validation loss = 0.28412994742393494
Validation loss = 0.2857999801635742
Validation loss = 0.2856287658214569
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.27e+03 |
| Iteration     | 5         |
| MaximumReturn | -740      |
| MinimumReturn | -2.28e+03 |
| TotalSamples  | 28000     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.27717071771621704
Validation loss = 0.2310154289007187
Validation loss = 0.22897091507911682
Validation loss = 0.22782358527183533
Validation loss = 0.2247522622346878
Validation loss = 0.22340597212314606
Validation loss = 0.21972250938415527
Validation loss = 0.225316122174263
Validation loss = 0.22021572291851044
Validation loss = 0.21448948979377747
Validation loss = 0.21924404799938202
Validation loss = 0.2213984578847885
Validation loss = 0.21540775895118713
Validation loss = 0.22666919231414795
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2551058828830719
Validation loss = 0.23037195205688477
Validation loss = 0.22016695141792297
Validation loss = 0.21698451042175293
Validation loss = 0.21324647963047028
Validation loss = 0.21415741741657257
Validation loss = 0.22041556239128113
Validation loss = 0.21429359912872314
Validation loss = 0.21229569613933563
Validation loss = 0.21250750124454498
Validation loss = 0.2062148004770279
Validation loss = 0.20720192790031433
Validation loss = 0.20919258892536163
Validation loss = 0.2125840187072754
Validation loss = 0.21318422257900238
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2588668763637543
Validation loss = 0.22576968371868134
Validation loss = 0.21225793659687042
Validation loss = 0.21078015863895416
Validation loss = 0.21164974570274353
Validation loss = 0.2092435508966446
Validation loss = 0.21099548041820526
Validation loss = 0.2110770046710968
Validation loss = 0.2258305549621582
Validation loss = 0.20740513503551483
Validation loss = 0.20909807085990906
Validation loss = 0.20698745548725128
Validation loss = 0.20700040459632874
Validation loss = 0.2123592644929886
Validation loss = 0.20279596745967865
Validation loss = 0.19890917837619781
Validation loss = 0.20293080806732178
Validation loss = 0.20590291917324066
Validation loss = 0.20586612820625305
Validation loss = 0.2172452211380005
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2633083164691925
Validation loss = 0.24971601366996765
Validation loss = 0.2300877571105957
Validation loss = 0.23065218329429626
Validation loss = 0.22675558924674988
Validation loss = 0.2227352112531662
Validation loss = 0.22889244556427002
Validation loss = 0.22172757983207703
Validation loss = 0.22739280760288239
Validation loss = 0.22358429431915283
Validation loss = 0.22574248909950256
Validation loss = 0.22960153222084045
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.270779013633728
Validation loss = 0.24115486443042755
Validation loss = 0.23155638575553894
Validation loss = 0.2245277315378189
Validation loss = 0.2270575761795044
Validation loss = 0.2221529185771942
Validation loss = 0.2204834520816803
Validation loss = 0.22250381112098694
Validation loss = 0.2236889749765396
Validation loss = 0.22305205464363098
Validation loss = 0.2201012223958969
Validation loss = 0.21622808277606964
Validation loss = 0.21483267843723297
Validation loss = 0.22718265652656555
Validation loss = 0.24217097461223602
Validation loss = 0.20694676041603088
Validation loss = 0.20855997502803802
Validation loss = 0.20944353938102722
Validation loss = 0.2138443887233734
Validation loss = 0.20931589603424072
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -697     |
| Iteration     | 6        |
| MaximumReturn | -382     |
| MinimumReturn | -965     |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2349996566772461
Validation loss = 0.1982702612876892
Validation loss = 0.18595734238624573
Validation loss = 0.18372824788093567
Validation loss = 0.1835113912820816
Validation loss = 0.18603253364562988
Validation loss = 0.18018439412117004
Validation loss = 0.18257082998752594
Validation loss = 0.1828153282403946
Validation loss = 0.18902301788330078
Validation loss = 0.18037696182727814
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.208014577627182
Validation loss = 0.18921330571174622
Validation loss = 0.1883566677570343
Validation loss = 0.1782434582710266
Validation loss = 0.17969277501106262
Validation loss = 0.1818012297153473
Validation loss = 0.19421541690826416
Validation loss = 0.1764691174030304
Validation loss = 0.17415088415145874
Validation loss = 0.16949762403964996
Validation loss = 0.1713445633649826
Validation loss = 0.17287340760231018
Validation loss = 0.18041139841079712
Validation loss = 0.1750423014163971
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.21111775934696198
Validation loss = 0.19030603766441345
Validation loss = 0.17386332154273987
Validation loss = 0.1701781451702118
Validation loss = 0.17380763590335846
Validation loss = 0.1743578314781189
Validation loss = 0.17074894905090332
Validation loss = 0.17303511500358582
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.22749167680740356
Validation loss = 0.21201637387275696
Validation loss = 0.19952994585037231
Validation loss = 0.19209599494934082
Validation loss = 0.18780404329299927
Validation loss = 0.19058294594287872
Validation loss = 0.19204412400722504
Validation loss = 0.1928112953901291
Validation loss = 0.18811042606830597
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.21948587894439697
Validation loss = 0.1943928748369217
Validation loss = 0.18461960554122925
Validation loss = 0.18190661072731018
Validation loss = 0.17986652255058289
Validation loss = 0.1843603551387787
Validation loss = 0.18799304962158203
Validation loss = 0.17821967601776123
Validation loss = 0.1763736456632614
Validation loss = 0.17278894782066345
Validation loss = 0.17306402325630188
Validation loss = 0.1919935643672943
Validation loss = 0.18258921802043915
Validation loss = 0.1719214916229248
Validation loss = 0.1679714322090149
Validation loss = 0.17088595032691956
Validation loss = 0.17403671145439148
Validation loss = 0.18844497203826904
Validation loss = 0.17184093594551086
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -472     |
| Iteration     | 7        |
| MaximumReturn | 47       |
| MinimumReturn | -784     |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.20832575857639313
Validation loss = 0.1911119669675827
Validation loss = 0.18345636129379272
Validation loss = 0.1791772097349167
Validation loss = 0.1782599836587906
Validation loss = 0.17795555293560028
Validation loss = 0.18311232328414917
Validation loss = 0.17893260717391968
Validation loss = 0.1934104710817337
Validation loss = 0.18081161379814148
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2006988525390625
Validation loss = 0.1788998395204544
Validation loss = 0.17251376807689667
Validation loss = 0.17259059846401215
Validation loss = 0.16995678842067719
Validation loss = 0.17792248725891113
Validation loss = 0.1747591495513916
Validation loss = 0.16822543740272522
Validation loss = 0.17059139907360077
Validation loss = 0.16617202758789062
Validation loss = 0.17252571880817413
Validation loss = 0.18624241650104523
Validation loss = 0.17009559273719788
Validation loss = 0.16230560839176178
Validation loss = 0.16216783225536346
Validation loss = 0.16007372736930847
Validation loss = 0.17178690433502197
Validation loss = 0.16758140921592712
Validation loss = 0.1776859313249588
Validation loss = 0.15791676938533783
Validation loss = 0.16128282248973846
Validation loss = 0.15917819738388062
Validation loss = 0.15954963862895966
Validation loss = 0.15830478072166443
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.19369326531887054
Validation loss = 0.193718820810318
Validation loss = 0.17085279524326324
Validation loss = 0.17039461433887482
Validation loss = 0.17051947116851807
Validation loss = 0.16735811531543732
Validation loss = 0.17768412828445435
Validation loss = 0.16509407758712769
Validation loss = 0.16784699261188507
Validation loss = 0.16787667572498322
Validation loss = 0.17882034182548523
Validation loss = 0.16385699808597565
Validation loss = 0.1629202663898468
Validation loss = 0.1644834727048874
Validation loss = 0.16112762689590454
Validation loss = 0.2144472301006317
Validation loss = 0.16078396141529083
Validation loss = 0.15718168020248413
Validation loss = 0.15855476260185242
Validation loss = 0.15802332758903503
Validation loss = 0.16656488180160522
Validation loss = 0.1641143560409546
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.21669618785381317
Validation loss = 0.1994205266237259
Validation loss = 0.19150815904140472
Validation loss = 0.18909446895122528
Validation loss = 0.18787114322185516
Validation loss = 0.19400137662887573
Validation loss = 0.18492650985717773
Validation loss = 0.18493586778640747
Validation loss = 0.17990435659885406
Validation loss = 0.20751428604125977
Validation loss = 0.1943967640399933
Validation loss = 0.18071697652339935
Validation loss = 0.18064908683300018
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.20017151534557343
Validation loss = 0.17501391470432281
Validation loss = 0.16930073499679565
Validation loss = 0.1648550182580948
Validation loss = 0.17020170390605927
Validation loss = 0.17153090238571167
Validation loss = 0.1695721596479416
Validation loss = 0.1819853037595749
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -388     |
| Iteration     | 8        |
| MaximumReturn | 47.8     |
| MinimumReturn | -791     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1856454610824585
Validation loss = 0.16213172674179077
Validation loss = 0.15815547108650208
Validation loss = 0.15529172122478485
Validation loss = 0.15804234147071838
Validation loss = 0.15937374532222748
Validation loss = 0.174367755651474
Validation loss = 0.15456220507621765
Validation loss = 0.15367980301380157
Validation loss = 0.15669968724250793
Validation loss = 0.1726919263601303
Validation loss = 0.1659075915813446
Validation loss = 0.15045234560966492
Validation loss = 0.1491573452949524
Validation loss = 0.15191863477230072
Validation loss = 0.1506168246269226
Validation loss = 0.15975068509578705
Validation loss = 0.1563437432050705
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1818121373653412
Validation loss = 0.15161237120628357
Validation loss = 0.14374478161334991
Validation loss = 0.14264854788780212
Validation loss = 0.13972899317741394
Validation loss = 0.14170989394187927
Validation loss = 0.13733835518360138
Validation loss = 0.14238731563091278
Validation loss = 0.15063397586345673
Validation loss = 0.1455034464597702
Validation loss = 0.1403382271528244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1781633198261261
Validation loss = 0.14959435164928436
Validation loss = 0.14892587065696716
Validation loss = 0.14614206552505493
Validation loss = 0.1465139091014862
Validation loss = 0.15329930186271667
Validation loss = 0.16513437032699585
Validation loss = 0.1424076110124588
Validation loss = 0.13983169198036194
Validation loss = 0.13986313343048096
Validation loss = 0.14955683052539825
Validation loss = 0.1561356484889984
Validation loss = 0.14476576447486877
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1825098693370819
Validation loss = 0.17320242524147034
Validation loss = 0.1610080897808075
Validation loss = 0.15987460315227509
Validation loss = 0.16356304287910461
Validation loss = 0.1687779575586319
Validation loss = 0.17282505333423615
Validation loss = 0.16335827112197876
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1838885247707367
Validation loss = 0.1596134454011917
Validation loss = 0.14886413514614105
Validation loss = 0.1552751362323761
Validation loss = 0.14937946200370789
Validation loss = 0.14991909265518188
Validation loss = 0.1548660695552826
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -983      |
| Iteration     | 9         |
| MaximumReturn | -692      |
| MinimumReturn | -1.56e+03 |
| TotalSamples  | 44000     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16010533273220062
Validation loss = 0.14293180406093597
Validation loss = 0.13636042177677155
Validation loss = 0.14057694375514984
Validation loss = 0.1477813571691513
Validation loss = 0.13504137098789215
Validation loss = 0.1390789896249771
Validation loss = 0.13953277468681335
Validation loss = 0.14196649193763733
Validation loss = 0.13829515874385834
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15429586172103882
Validation loss = 0.14499691128730774
Validation loss = 0.12775127589702606
Validation loss = 0.12802083790302277
Validation loss = 0.1325816959142685
Validation loss = 0.12673647701740265
Validation loss = 0.1286529153585434
Validation loss = 0.12912651896476746
Validation loss = 0.13015717267990112
Validation loss = 0.12796129286289215
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1474219262599945
Validation loss = 0.13644415140151978
Validation loss = 0.13144183158874512
Validation loss = 0.12927643954753876
Validation loss = 0.13201850652694702
Validation loss = 0.12888988852500916
Validation loss = 0.12800776958465576
Validation loss = 0.13324792683124542
Validation loss = 0.149439737200737
Validation loss = 0.12724679708480835
Validation loss = 0.1253640502691269
Validation loss = 0.12536104023456573
Validation loss = 0.12923294305801392
Validation loss = 0.1311972737312317
Validation loss = 0.13473443686962128
Validation loss = 0.12322937697172165
Validation loss = 0.12273523211479187
Validation loss = 0.12474376708269119
Validation loss = 0.12329859286546707
Validation loss = 0.14523530006408691
Validation loss = 0.12256011366844177
Validation loss = 0.12004570662975311
Validation loss = 0.12201929837465286
Validation loss = 0.1346496194601059
Validation loss = 0.13356469571590424
Validation loss = 0.11952748894691467
Validation loss = 0.12167231738567352
Validation loss = 0.12286791205406189
Validation loss = 0.14788682758808136
Validation loss = 0.12055235356092453
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.16417253017425537
Validation loss = 0.1491648107767105
Validation loss = 0.14775694906711578
Validation loss = 0.14958555996418
Validation loss = 0.14525645971298218
Validation loss = 0.14248652756214142
Validation loss = 0.14285947382450104
Validation loss = 0.1569053828716278
Validation loss = 0.14580483734607697
Validation loss = 0.14381363987922668
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1627347320318222
Validation loss = 0.13892829418182373
Validation loss = 0.14031241834163666
Validation loss = 0.13540440797805786
Validation loss = 0.13807089626789093
Validation loss = 0.13800284266471863
Validation loss = 0.1320519596338272
Validation loss = 0.13771139085292816
Validation loss = 0.13204823434352875
Validation loss = 0.13549478352069855
Validation loss = 0.13315728306770325
Validation loss = 0.14208287000656128
Validation loss = 0.14029596745967865
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -709      |
| Iteration     | 10        |
| MaximumReturn | -298      |
| MinimumReturn | -1.09e+03 |
| TotalSamples  | 48000     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.148167684674263
Validation loss = 0.1312454491853714
Validation loss = 0.12406285852193832
Validation loss = 0.12365707755088806
Validation loss = 0.12812292575836182
Validation loss = 0.12376906722784042
Validation loss = 0.12247985601425171
Validation loss = 0.12310070544481277
Validation loss = 0.14623145759105682
Validation loss = 0.12550820410251617
Validation loss = 0.1165003702044487
Validation loss = 0.1182120069861412
Validation loss = 0.12288588285446167
Validation loss = 0.13587714731693268
Validation loss = 0.11766258627176285
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.13857817649841309
Validation loss = 0.11661454290151596
Validation loss = 0.11117871850728989
Validation loss = 0.11424144357442856
Validation loss = 0.1182081401348114
Validation loss = 0.11657372117042542
Validation loss = 0.11360496282577515
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12249094247817993
Validation loss = 0.10978758335113525
Validation loss = 0.11175044625997543
Validation loss = 0.1100822389125824
Validation loss = 0.11134248971939087
Validation loss = 0.1098940372467041
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15455548465251923
Validation loss = 0.13256163895130157
Validation loss = 0.12558330595493317
Validation loss = 0.1247180700302124
Validation loss = 0.13010470569133759
Validation loss = 0.12807905673980713
Validation loss = 0.12991559505462646
Validation loss = 0.12410229444503784
Validation loss = 0.124867282807827
Validation loss = 0.1292009800672531
Validation loss = 0.13026262819766998
Validation loss = 0.12087363004684448
Validation loss = 0.1189165711402893
Validation loss = 0.12116227298974991
Validation loss = 0.12025948613882065
Validation loss = 0.13766688108444214
Validation loss = 0.12062301486730576
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1432815045118332
Validation loss = 0.12320808321237564
Validation loss = 0.11641184240579605
Validation loss = 0.11971020698547363
Validation loss = 0.11771180480718613
Validation loss = 0.11939237266778946
Validation loss = 0.1210799291729927
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -766      |
| Iteration     | 11        |
| MaximumReturn | -29.3     |
| MinimumReturn | -1.68e+03 |
| TotalSamples  | 52000     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11733505129814148
Validation loss = 0.11096811294555664
Validation loss = 0.11151003837585449
Validation loss = 0.10739235579967499
Validation loss = 0.11313891410827637
Validation loss = 0.11268513649702072
Validation loss = 0.11683250963687897
Validation loss = 0.11578147113323212
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11296527832746506
Validation loss = 0.10350914299488068
Validation loss = 0.10289125144481659
Validation loss = 0.10424216091632843
Validation loss = 0.10318398475646973
Validation loss = 0.10823690891265869
Validation loss = 0.10448654741048813
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12786298990249634
Validation loss = 0.103586845099926
Validation loss = 0.10071354359388351
Validation loss = 0.10182984173297882
Validation loss = 0.1000581830739975
Validation loss = 0.11598245054483414
Validation loss = 0.10179972648620605
Validation loss = 0.09810678660869598
Validation loss = 0.10173407942056656
Validation loss = 0.11306902021169662
Validation loss = 0.09794355928897858
Validation loss = 0.09537026286125183
Validation loss = 0.0974854901432991
Validation loss = 0.10406170785427094
Validation loss = 0.09969384223222733
Validation loss = 0.11313224583864212
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11214787513017654
Validation loss = 0.1094655841588974
Validation loss = 0.10892830789089203
Validation loss = 0.10818055272102356
Validation loss = 0.12080954760313034
Validation loss = 0.10914438217878342
Validation loss = 0.10824549943208694
Validation loss = 0.11112981289625168
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13291822373867035
Validation loss = 0.11349049210548401
Validation loss = 0.11025727540254593
Validation loss = 0.11055406928062439
Validation loss = 0.11201682686805725
Validation loss = 0.10907638072967529
Validation loss = 0.10774658620357513
Validation loss = 0.10623837262392044
Validation loss = 0.11347503960132599
Validation loss = 0.11081445962190628
Validation loss = 0.10190118849277496
Validation loss = 0.10201079398393631
Validation loss = 0.10346778482198715
Validation loss = 0.12022406607866287
Validation loss = 0.10625641793012619
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -580     |
| Iteration     | 12       |
| MaximumReturn | -433     |
| MinimumReturn | -863     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1157107874751091
Validation loss = 0.11112343519926071
Validation loss = 0.10385913401842117
Validation loss = 0.10077754408121109
Validation loss = 0.10317728668451309
Validation loss = 0.10282133519649506
Validation loss = 0.1108698770403862
Validation loss = 0.09989053755998611
Validation loss = 0.099725142121315
Validation loss = 0.10155599564313889
Validation loss = 0.10461024940013885
Validation loss = 0.10193329304456711
Validation loss = 0.09751620143651962
Validation loss = 0.09888914972543716
Validation loss = 0.10119890421628952
Validation loss = 0.1075352355837822
Validation loss = 0.10244619846343994
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1224280372262001
Validation loss = 0.09946434944868088
Validation loss = 0.0926605835556984
Validation loss = 0.09356937557458878
Validation loss = 0.10306140035390854
Validation loss = 0.10436975210905075
Validation loss = 0.09200123697519302
Validation loss = 0.0924074798822403
Validation loss = 0.10119497030973434
Validation loss = 0.1036311611533165
Validation loss = 0.0933547168970108
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10611691325902939
Validation loss = 0.09344764053821564
Validation loss = 0.08894143253564835
Validation loss = 0.0900874137878418
Validation loss = 0.09413198381662369
Validation loss = 0.09971600025892258
Validation loss = 0.09837937355041504
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11893279105424881
Validation loss = 0.11092518270015717
Validation loss = 0.10074111074209213
Validation loss = 0.10183615982532501
Validation loss = 0.10498730093240738
Validation loss = 0.10190349817276001
Validation loss = 0.10435951501131058
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10501211881637573
Validation loss = 0.09858109056949615
Validation loss = 0.09744249284267426
Validation loss = 0.10723008960485458
Validation loss = 0.09717734903097153
Validation loss = 0.09661664813756943
Validation loss = 0.1066247746348381
Validation loss = 0.0977986603975296
Validation loss = 0.09767097979784012
Validation loss = 0.09875974804162979
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 133      |
| Iteration     | 13       |
| MaximumReturn | 384      |
| MinimumReturn | -197     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10397636145353317
Validation loss = 0.09253208339214325
Validation loss = 0.0908748134970665
Validation loss = 0.0921093299984932
Validation loss = 0.11660809814929962
Validation loss = 0.08842263370752335
Validation loss = 0.08753250539302826
Validation loss = 0.09036675095558167
Validation loss = 0.09684637188911438
Validation loss = 0.10351208597421646
Validation loss = 0.0870356559753418
Validation loss = 0.08598533272743225
Validation loss = 0.08991167694330215
Validation loss = 0.09446132928133011
Validation loss = 0.10094157606363297
Validation loss = 0.08501973748207092
Validation loss = 0.08728189766407013
Validation loss = 0.09563092887401581
Validation loss = 0.08775252848863602
Validation loss = 0.08569807559251785
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08958600461483002
Validation loss = 0.08850421011447906
Validation loss = 0.08542473614215851
Validation loss = 0.08789409697055817
Validation loss = 0.09358321130275726
Validation loss = 0.0837489515542984
Validation loss = 0.08454117178916931
Validation loss = 0.0952557697892189
Validation loss = 0.09236057847738266
Validation loss = 0.08602084964513779
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0932907909154892
Validation loss = 0.08387441188097
Validation loss = 0.08595933020114899
Validation loss = 0.0841466411948204
Validation loss = 0.0961436852812767
Validation loss = 0.08469881862401962
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10672961920499802
Validation loss = 0.09795615822076797
Validation loss = 0.09316686540842056
Validation loss = 0.09475398063659668
Validation loss = 0.09481227397918701
Validation loss = 0.10389435291290283
Validation loss = 0.10025288164615631
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10849013179540634
Validation loss = 0.09421984851360321
Validation loss = 0.08820665627717972
Validation loss = 0.0876033678650856
Validation loss = 0.09034589678049088
Validation loss = 0.10420434921979904
Validation loss = 0.08848977088928223
Validation loss = 0.0873437225818634
Validation loss = 0.08916138112545013
Validation loss = 0.09485752880573273
Validation loss = 0.08652301132678986
Validation loss = 0.08662140369415283
Validation loss = 0.09192047268152237
Validation loss = 0.09647670388221741
Validation loss = 0.0856897160410881
Validation loss = 0.08521246165037155
Validation loss = 0.08835416287183762
Validation loss = 0.09806089848279953
Validation loss = 0.08602652698755264
Validation loss = 0.08330116420984268
Validation loss = 0.08559899777173996
Validation loss = 0.11092453449964523
Validation loss = 0.09133637696504593
Validation loss = 0.08353672921657562
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 725      |
| Iteration     | 14       |
| MaximumReturn | 878      |
| MinimumReturn | 596      |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0959073007106781
Validation loss = 0.08949866890907288
Validation loss = 0.08312041312456131
Validation loss = 0.0823189914226532
Validation loss = 0.0869191363453865
Validation loss = 0.0864848643541336
Validation loss = 0.08244207501411438
Validation loss = 0.08379405736923218
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09080217778682709
Validation loss = 0.07953020930290222
Validation loss = 0.08074295520782471
Validation loss = 0.08475257456302643
Validation loss = 0.0844232439994812
Validation loss = 0.08222916722297668
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08803834021091461
Validation loss = 0.07844895124435425
Validation loss = 0.08083444088697433
Validation loss = 0.08094827830791473
Validation loss = 0.07705442607402802
Validation loss = 0.08059363812208176
Validation loss = 0.08149753510951996
Validation loss = 0.08401836454868317
Validation loss = 0.07960072159767151
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10206395387649536
Validation loss = 0.08777695894241333
Validation loss = 0.08934977650642395
Validation loss = 0.09212897717952728
Validation loss = 0.09244969487190247
Validation loss = 0.0868615061044693
Validation loss = 0.08618001639842987
Validation loss = 0.10909142345190048
Validation loss = 0.08564233779907227
Validation loss = 0.08595479279756546
Validation loss = 0.0931473821401596
Validation loss = 0.09223029762506485
Validation loss = 0.08328940719366074
Validation loss = 0.08216370642185211
Validation loss = 0.08712933957576752
Validation loss = 0.08721788227558136
Validation loss = 0.08338828384876251
Validation loss = 0.08420102298259735
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09171409904956818
Validation loss = 0.08563590049743652
Validation loss = 0.07950584590435028
Validation loss = 0.08020755648612976
Validation loss = 0.08307579904794693
Validation loss = 0.08210282027721405
Validation loss = 0.07987897098064423
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 585      |
| Iteration     | 15       |
| MaximumReturn | 1.04e+03 |
| MinimumReturn | -640     |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10128273069858551
Validation loss = 0.07679783552885056
Validation loss = 0.07552339881658554
Validation loss = 0.07518195360898972
Validation loss = 0.07839130610227585
Validation loss = 0.07895759493112564
Validation loss = 0.07595978677272797
Validation loss = 0.0883154422044754
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08526130020618439
Validation loss = 0.08023816347122192
Validation loss = 0.0728505402803421
Validation loss = 0.07540582865476608
Validation loss = 0.0759255439043045
Validation loss = 0.08783234655857086
Validation loss = 0.07588961720466614
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08442471921443939
Validation loss = 0.07465025037527084
Validation loss = 0.0778866559267044
Validation loss = 0.07656750828027725
Validation loss = 0.07316994667053223
Validation loss = 0.07515755295753479
Validation loss = 0.07975012063980103
Validation loss = 0.07566612213850021
Validation loss = 0.07494481652975082
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08867423236370087
Validation loss = 0.07954574376344681
Validation loss = 0.07627396285533905
Validation loss = 0.07970375567674637
Validation loss = 0.07921930402517319
Validation loss = 0.07741380482912064
Validation loss = 0.08025159686803818
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08172468841075897
Validation loss = 0.07509967684745789
Validation loss = 0.07519041746854782
Validation loss = 0.07930748909711838
Validation loss = 0.07627511024475098
Validation loss = 0.07878997176885605
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.52e+03 |
| Iteration     | 16       |
| MaximumReturn | 1.92e+03 |
| MinimumReturn | 1.04e+03 |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08906297385692596
Validation loss = 0.07264277338981628
Validation loss = 0.0757579356431961
Validation loss = 0.07927095144987106
Validation loss = 0.07537989318370819
Validation loss = 0.07263414561748505
Validation loss = 0.07519536465406418
Validation loss = 0.08146388828754425
Validation loss = 0.07653682678937912
Validation loss = 0.07205282151699066
Validation loss = 0.07127634435892105
Validation loss = 0.07248448580503464
Validation loss = 0.07476162910461426
Validation loss = 0.0707407295703888
Validation loss = 0.0710260197520256
Validation loss = 0.07369651645421982
Validation loss = 0.0708608329296112
Validation loss = 0.06948279589414597
Validation loss = 0.08349701762199402
Validation loss = 0.07037920504808426
Validation loss = 0.06795582175254822
Validation loss = 0.06989163905382156
Validation loss = 0.07373657077550888
Validation loss = 0.07155069708824158
Validation loss = 0.06995400786399841
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07961490005254745
Validation loss = 0.07170440256595612
Validation loss = 0.07137461006641388
Validation loss = 0.07904455065727234
Validation loss = 0.07462767511606216
Validation loss = 0.07220109552145004
Validation loss = 0.07833311706781387
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08164768666028976
Validation loss = 0.07653561234474182
Validation loss = 0.0687277764081955
Validation loss = 0.06902197003364563
Validation loss = 0.07781171798706055
Validation loss = 0.0709841325879097
Validation loss = 0.06981845200061798
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09075676649808884
Validation loss = 0.07782261818647385
Validation loss = 0.07412081956863403
Validation loss = 0.07470201700925827
Validation loss = 0.07831433415412903
Validation loss = 0.07503340393304825
Validation loss = 0.07509861141443253
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07960274815559387
Validation loss = 0.07540126144886017
Validation loss = 0.07735250145196915
Validation loss = 0.07307615131139755
Validation loss = 0.07475948333740234
Validation loss = 0.07212004065513611
Validation loss = 0.07530324161052704
Validation loss = 0.0747174322605133
Validation loss = 0.07132203876972198
Validation loss = 0.07486764341592789
Validation loss = 0.0827251747250557
Validation loss = 0.07182785868644714
Validation loss = 0.08051776885986328
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.13e+03 |
| Iteration     | 17       |
| MaximumReturn | 1.6e+03  |
| MinimumReturn | -420     |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09750810265541077
Validation loss = 0.06804152578115463
Validation loss = 0.06668435782194138
Validation loss = 0.0676494762301445
Validation loss = 0.06808364391326904
Validation loss = 0.07174509763717651
Validation loss = 0.07224184274673462
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08182483911514282
Validation loss = 0.07338688522577286
Validation loss = 0.06800185143947601
Validation loss = 0.07206764817237854
Validation loss = 0.07163435220718384
Validation loss = 0.06734242290258408
Validation loss = 0.07025314122438431
Validation loss = 0.07054489850997925
Validation loss = 0.06562616676092148
Validation loss = 0.06727105379104614
Validation loss = 0.07331982254981995
Validation loss = 0.07781674712896347
Validation loss = 0.06563200056552887
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08526060730218887
Validation loss = 0.06996969133615494
Validation loss = 0.06967892497777939
Validation loss = 0.07248189300298691
Validation loss = 0.06937116384506226
Validation loss = 0.06928732991218567
Validation loss = 0.06876084953546524
Validation loss = 0.06955033540725708
Validation loss = 0.06857937574386597
Validation loss = 0.07183736562728882
Validation loss = 0.06746545433998108
Validation loss = 0.06701246649026871
Validation loss = 0.07773711532354355
Validation loss = 0.07540959119796753
Validation loss = 0.06505411863327026
Validation loss = 0.06798390299081802
Validation loss = 0.06897825747728348
Validation loss = 0.07224401831626892
Validation loss = 0.06722453981637955
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09135526418685913
Validation loss = 0.07292015850543976
Validation loss = 0.07225017249584198
Validation loss = 0.07212579995393753
Validation loss = 0.07256807386875153
Validation loss = 0.07391760498285294
Validation loss = 0.08240935206413269
Validation loss = 0.07105010002851486
Validation loss = 0.06724763661623001
Validation loss = 0.07012683153152466
Validation loss = 0.08490794897079468
Validation loss = 0.06865852326154709
Validation loss = 0.06799665093421936
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08381963521242142
Validation loss = 0.07034727931022644
Validation loss = 0.0707264393568039
Validation loss = 0.06910379976034164
Validation loss = 0.08100695163011551
Validation loss = 0.06780745834112167
Validation loss = 0.06764000654220581
Validation loss = 0.07335980981588364
Validation loss = 0.06782200187444687
Validation loss = 0.06768139451742172
Validation loss = 0.07696409523487091
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.64e+03 |
| Iteration     | 18       |
| MaximumReturn | 2.08e+03 |
| MinimumReturn | 1.24e+03 |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07781627029180527
Validation loss = 0.06979618221521378
Validation loss = 0.06493305414915085
Validation loss = 0.06912969052791595
Validation loss = 0.06718329340219498
Validation loss = 0.06568551063537598
Validation loss = 0.06491903960704803
Validation loss = 0.06559502333402634
Validation loss = 0.06505248695611954
Validation loss = 0.06464938074350357
Validation loss = 0.06906483322381973
Validation loss = 0.06691472232341766
Validation loss = 0.06292995065450668
Validation loss = 0.06517146527767181
Validation loss = 0.0657174214720726
Validation loss = 0.0640651136636734
Validation loss = 0.07502741366624832
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07129379361867905
Validation loss = 0.0711357593536377
Validation loss = 0.06549419462680817
Validation loss = 0.07336310297250748
Validation loss = 0.06825848668813705
Validation loss = 0.0671544224023819
Validation loss = 0.06737812608480453
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07349123805761337
Validation loss = 0.06506302207708359
Validation loss = 0.06714282929897308
Validation loss = 0.06667319685220718
Validation loss = 0.06742719560861588
Validation loss = 0.06606842577457428
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0809556096792221
Validation loss = 0.0673685073852539
Validation loss = 0.06745914369821548
Validation loss = 0.06955640017986298
Validation loss = 0.0684802308678627
Validation loss = 0.07402722537517548
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07520552724599838
Validation loss = 0.0654071569442749
Validation loss = 0.06592731177806854
Validation loss = 0.06692856550216675
Validation loss = 0.06874135136604309
Validation loss = 0.06785964220762253
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.62e+03 |
| Iteration     | 19       |
| MaximumReturn | 2.13e+03 |
| MinimumReturn | 118      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08490254729986191
Validation loss = 0.0637141540646553
Validation loss = 0.06047709286212921
Validation loss = 0.06569270044565201
Validation loss = 0.07092392444610596
Validation loss = 0.06538869440555573
Validation loss = 0.06039319559931755
Validation loss = 0.061461202800273895
Validation loss = 0.06275681406259537
Validation loss = 0.06233786419034004
Validation loss = 0.06267256289720535
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08555663377046585
Validation loss = 0.06437426805496216
Validation loss = 0.06303607672452927
Validation loss = 0.06721483170986176
Validation loss = 0.07067792862653732
Validation loss = 0.0677846297621727
Validation loss = 0.06110949069261551
Validation loss = 0.06513462215662003
Validation loss = 0.0677599161863327
Validation loss = 0.060032181441783905
Validation loss = 0.0629829689860344
Validation loss = 0.07032456248998642
Validation loss = 0.05970880389213562
Validation loss = 0.06102767959237099
Validation loss = 0.06045963615179062
Validation loss = 0.06814875453710556
Validation loss = 0.06044744700193405
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07214362174272537
Validation loss = 0.06509970873594284
Validation loss = 0.0621403232216835
Validation loss = 0.06980966031551361
Validation loss = 0.06295791268348694
Validation loss = 0.07248393446207047
Validation loss = 0.0637824609875679
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07782135903835297
Validation loss = 0.06352581083774567
Validation loss = 0.0628172755241394
Validation loss = 0.07586769014596939
Validation loss = 0.0650811493396759
Validation loss = 0.061699215322732925
Validation loss = 0.07042647153139114
Validation loss = 0.06366270780563354
Validation loss = 0.06393441557884216
Validation loss = 0.0653686672449112
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0717802345752716
Validation loss = 0.06381088495254517
Validation loss = 0.0684962198138237
Validation loss = 0.06623135507106781
Validation loss = 0.06317373365163803
Validation loss = 0.0660080760717392
Validation loss = 0.06904736906290054
Validation loss = 0.06088017299771309
Validation loss = 0.0647745206952095
Validation loss = 0.067369244992733
Validation loss = 0.07912974059581757
Validation loss = 0.06247440353035927
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.95e+03 |
| Iteration     | 20       |
| MaximumReturn | 2.33e+03 |
| MinimumReturn | 1.71e+03 |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07123728096485138
Validation loss = 0.06079151853919029
Validation loss = 0.06005699932575226
Validation loss = 0.06248849257826805
Validation loss = 0.06290388852357864
Validation loss = 0.058717940002679825
Validation loss = 0.063257597386837
Validation loss = 0.0621170774102211
Validation loss = 0.06211477518081665
Validation loss = 0.06226777657866478
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06336858868598938
Validation loss = 0.0594392791390419
Validation loss = 0.06099317967891693
Validation loss = 0.062214549630880356
Validation loss = 0.06143450736999512
Validation loss = 0.061232492327690125
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06823697686195374
Validation loss = 0.06203658878803253
Validation loss = 0.05998289957642555
Validation loss = 0.06841516494750977
Validation loss = 0.060323894023895264
Validation loss = 0.05967148393392563
Validation loss = 0.06140049919486046
Validation loss = 0.06453090161085129
Validation loss = 0.06163839250802994
Validation loss = 0.05880117788910866
Validation loss = 0.06172129139304161
Validation loss = 0.057898763567209244
Validation loss = 0.0803353562951088
Validation loss = 0.05878633260726929
Validation loss = 0.058325789868831635
Validation loss = 0.06473568826913834
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07735684514045715
Validation loss = 0.06169711425900459
Validation loss = 0.0607026070356369
Validation loss = 0.06091220676898956
Validation loss = 0.06915325671434402
Validation loss = 0.06097211688756943
Validation loss = 0.060609087347984314
Validation loss = 0.06782793253660202
Validation loss = 0.06019184738397598
Validation loss = 0.059807583689689636
Validation loss = 0.06859073787927628
Validation loss = 0.0633903220295906
Validation loss = 0.0593155138194561
Validation loss = 0.06292510032653809
Validation loss = 0.06233416497707367
Validation loss = 0.05704263225197792
Validation loss = 0.059755172580480576
Validation loss = 0.06569187343120575
Validation loss = 0.05963551998138428
Validation loss = 0.057712849229574203
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06583912670612335
Validation loss = 0.061150193214416504
Validation loss = 0.06307823956012726
Validation loss = 0.06668364256620407
Validation loss = 0.06087477505207062
Validation loss = 0.05990093573927879
Validation loss = 0.06707731634378433
Validation loss = 0.06211656704545021
Validation loss = 0.060551658272743225
Validation loss = 0.07161948084831238
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.61e+03 |
| Iteration     | 21       |
| MaximumReturn | 2.16e+03 |
| MinimumReturn | 448      |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07780110090970993
Validation loss = 0.05853977054357529
Validation loss = 0.05757155641913414
Validation loss = 0.06023845076560974
Validation loss = 0.06671123206615448
Validation loss = 0.05940347537398338
Validation loss = 0.058130744844675064
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06995683908462524
Validation loss = 0.05880986154079437
Validation loss = 0.060455258935689926
Validation loss = 0.05957672372460365
Validation loss = 0.0574272982776165
Validation loss = 0.06583203375339508
Validation loss = 0.057100482285022736
Validation loss = 0.05623069405555725
Validation loss = 0.06457121670246124
Validation loss = 0.058108262717723846
Validation loss = 0.0558280423283577
Validation loss = 0.05768871307373047
Validation loss = 0.06904472410678864
Validation loss = 0.053516339510679245
Validation loss = 0.05579636991024017
Validation loss = 0.05680158734321594
Validation loss = 0.06562021374702454
Validation loss = 0.05435418710112572
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07115001976490021
Validation loss = 0.05728762224316597
Validation loss = 0.06254202872514725
Validation loss = 0.07072120159864426
Validation loss = 0.056565526872873306
Validation loss = 0.05704155191779137
Validation loss = 0.05715787038207054
Validation loss = 0.06990288943052292
Validation loss = 0.05505617707967758
Validation loss = 0.05494830757379532
Validation loss = 0.07026924937963486
Validation loss = 0.055501364171504974
Validation loss = 0.05665989965200424
Validation loss = 0.05636544153094292
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06899671256542206
Validation loss = 0.05901703983545303
Validation loss = 0.057079531252384186
Validation loss = 0.06497360020875931
Validation loss = 0.0566360168159008
Validation loss = 0.05583285167813301
Validation loss = 0.06296741217374802
Validation loss = 0.05696416273713112
Validation loss = 0.056225113570690155
Validation loss = 0.07559634745121002
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07494685053825378
Validation loss = 0.06054571270942688
Validation loss = 0.05763000622391701
Validation loss = 0.06368052214384079
Validation loss = 0.05847881734371185
Validation loss = 0.05916045606136322
Validation loss = 0.06366134434938431
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.6e+03  |
| Iteration     | 22       |
| MaximumReturn | 2.01e+03 |
| MinimumReturn | 493      |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06529433280229568
Validation loss = 0.05632258579134941
Validation loss = 0.05837992951273918
Validation loss = 0.059822291135787964
Validation loss = 0.05701109766960144
Validation loss = 0.05484404042363167
Validation loss = 0.05805593729019165
Validation loss = 0.05845026671886444
Validation loss = 0.05397691950201988
Validation loss = 0.05747903883457184
Validation loss = 0.05536701902747154
Validation loss = 0.05482962727546692
Validation loss = 0.05822848156094551
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06260064989328384
Validation loss = 0.053441811352968216
Validation loss = 0.05316128209233284
Validation loss = 0.05288706347346306
Validation loss = 0.05784867703914642
Validation loss = 0.052077192813158035
Validation loss = 0.05257954075932503
Validation loss = 0.05865773186087608
Validation loss = 0.051582757383584976
Validation loss = 0.061443258076906204
Validation loss = 0.05389544367790222
Validation loss = 0.05243156477808952
Validation loss = 0.0551198311150074
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07540541142225266
Validation loss = 0.05563070997595787
Validation loss = 0.05253734067082405
Validation loss = 0.05772456154227257
Validation loss = 0.05702068284153938
Validation loss = 0.05230586603283882
Validation loss = 0.053402047604322433
Validation loss = 0.05526605248451233
Validation loss = 0.056822750717401505
Validation loss = 0.05152497813105583
Validation loss = 0.05496961995959282
Validation loss = 0.0639270469546318
Validation loss = 0.054521579295396805
Validation loss = 0.05688277259469032
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07840614765882492
Validation loss = 0.054705068469047546
Validation loss = 0.05476488545536995
Validation loss = 0.05727843940258026
Validation loss = 0.05504992604255676
Validation loss = 0.05376748368144035
Validation loss = 0.05744307115674019
Validation loss = 0.056262195110321045
Validation loss = 0.053433310240507126
Validation loss = 0.07149284332990646
Validation loss = 0.05274975299835205
Validation loss = 0.05206115171313286
Validation loss = 0.054437290877103806
Validation loss = 0.060955483466386795
Validation loss = 0.05230302736163139
Validation loss = 0.05516635254025459
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06496350467205048
Validation loss = 0.05590735748410225
Validation loss = 0.05531080439686775
Validation loss = 0.056307002902030945
Validation loss = 0.05515829101204872
Validation loss = 0.05673680827021599
Validation loss = 0.05741547420620918
Validation loss = 0.055841654539108276
Validation loss = 0.05662817135453224
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.2e+03  |
| Iteration     | 23       |
| MaximumReturn | 2.42e+03 |
| MinimumReturn | 2.03e+03 |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06587310880422592
Validation loss = 0.0541602186858654
Validation loss = 0.05544918403029442
Validation loss = 0.05707721412181854
Validation loss = 0.05409294739365578
Validation loss = 0.05345877632498741
Validation loss = 0.05676831305027008
Validation loss = 0.058788809925317764
Validation loss = 0.05590318515896797
Validation loss = 0.05357949808239937
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06639950722455978
Validation loss = 0.050140246748924255
Validation loss = 0.05055449903011322
Validation loss = 0.0653136670589447
Validation loss = 0.05142540484666824
Validation loss = 0.051160357892513275
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05829919874668121
Validation loss = 0.050584401935338974
Validation loss = 0.05381793528795242
Validation loss = 0.05325140431523323
Validation loss = 0.053402770310640335
Validation loss = 0.058211296796798706
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06844393908977509
Validation loss = 0.05263708159327507
Validation loss = 0.05247051641345024
Validation loss = 0.0676790252327919
Validation loss = 0.0518125519156456
Validation loss = 0.05171080678701401
Validation loss = 0.055670928210020065
Validation loss = 0.05191192775964737
Validation loss = 0.050798989832401276
Validation loss = 0.07216396182775497
Validation loss = 0.05048506334424019
Validation loss = 0.05021454393863678
Validation loss = 0.05736169219017029
Validation loss = 0.05212641879916191
Validation loss = 0.049689535051584244
Validation loss = 0.051757194101810455
Validation loss = 0.057952214032411575
Validation loss = 0.04964520409703255
Validation loss = 0.05655393749475479
Validation loss = 0.0554717481136322
Validation loss = 0.05147487670183182
Validation loss = 0.05188801884651184
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0748138427734375
Validation loss = 0.05409718677401543
Validation loss = 0.05268174782395363
Validation loss = 0.05798137187957764
Validation loss = 0.05301779881119728
Validation loss = 0.06642938405275345
Validation loss = 0.05215359106659889
Validation loss = 0.05145161226391792
Validation loss = 0.05975263938307762
Validation loss = 0.05204514414072037
Validation loss = 0.05232102423906326
Validation loss = 0.053768374025821686
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.07e+03 |
| Iteration     | 24       |
| MaximumReturn | 2.4e+03  |
| MinimumReturn | 1.85e+03 |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05327945947647095
Validation loss = 0.05239590257406235
Validation loss = 0.05657431110739708
Validation loss = 0.05287212133407593
Validation loss = 0.050268787890672684
Validation loss = 0.05582823231816292
Validation loss = 0.05336715653538704
Validation loss = 0.056632332503795624
Validation loss = 0.0534866601228714
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06188511103391647
Validation loss = 0.05029326677322388
Validation loss = 0.05070897564291954
Validation loss = 0.05742596089839935
Validation loss = 0.05041990429162979
Validation loss = 0.05075383931398392
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05886969342827797
Validation loss = 0.049734968692064285
Validation loss = 0.05091685429215431
Validation loss = 0.05633837357163429
Validation loss = 0.048208560794591904
Validation loss = 0.05903581529855728
Validation loss = 0.04963307082653046
Validation loss = 0.04827504977583885
Validation loss = 0.06573916971683502
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0658913254737854
Validation loss = 0.0510004498064518
Validation loss = 0.04912855103611946
Validation loss = 0.05161934718489647
Validation loss = 0.05011334270238876
Validation loss = 0.04902358725667
Validation loss = 0.05562055855989456
Validation loss = 0.05079828202724457
Validation loss = 0.04938153177499771
Validation loss = 0.05161551758646965
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06387496739625931
Validation loss = 0.051230233162641525
Validation loss = 0.05038217082619667
Validation loss = 0.05506132170557976
Validation loss = 0.053511638194322586
Validation loss = 0.050184253603219986
Validation loss = 0.05255254730582237
Validation loss = 0.0542680062353611
Validation loss = 0.053294770419597626
Validation loss = 0.0506267286837101
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.28e+03 |
| Iteration     | 25       |
| MaximumReturn | 2.74e+03 |
| MinimumReturn | 1.79e+03 |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05355021357536316
Validation loss = 0.051793359220027924
Validation loss = 0.05102456361055374
Validation loss = 0.05684177950024605
Validation loss = 0.051876895129680634
Validation loss = 0.05064937472343445
Validation loss = 0.055068206042051315
Validation loss = 0.05263827368617058
Validation loss = 0.049788039177656174
Validation loss = 0.05136401206254959
Validation loss = 0.052965499460697174
Validation loss = 0.050420697778463364
Validation loss = 0.06066913530230522
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06809832155704498
Validation loss = 0.05019776150584221
Validation loss = 0.0494011789560318
Validation loss = 0.055481404066085815
Validation loss = 0.0492515042424202
Validation loss = 0.04954860731959343
Validation loss = 0.056389786303043365
Validation loss = 0.05145946145057678
Validation loss = 0.05186978355050087
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05850975960493088
Validation loss = 0.04914851114153862
Validation loss = 0.058926377445459366
Validation loss = 0.04887264594435692
Validation loss = 0.050252825021743774
Validation loss = 0.05309641733765602
Validation loss = 0.04809324070811272
Validation loss = 0.04869937151670456
Validation loss = 0.05357031524181366
Validation loss = 0.04730148985981941
Validation loss = 0.04781500995159149
Validation loss = 0.06534338742494583
Validation loss = 0.051186610013246536
Validation loss = 0.04791560396552086
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06339286267757416
Validation loss = 0.048904433846473694
Validation loss = 0.04800422862172127
Validation loss = 0.0509752482175827
Validation loss = 0.05602585896849632
Validation loss = 0.047323182225227356
Validation loss = 0.04971674084663391
Validation loss = 0.05080043524503708
Validation loss = 0.04932031035423279
Validation loss = 0.04697272926568985
Validation loss = 0.05877610668540001
Validation loss = 0.04734000936150551
Validation loss = 0.04753684997558594
Validation loss = 0.059714846312999725
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06612540036439896
Validation loss = 0.04929399490356445
Validation loss = 0.05819835141301155
Validation loss = 0.0496600903570652
Validation loss = 0.04925234243273735
Validation loss = 0.06577807664871216
Validation loss = 0.04985591396689415
Validation loss = 0.04970736801624298
Validation loss = 0.054398346692323685
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.27e+03 |
| Iteration     | 26       |
| MaximumReturn | 2.63e+03 |
| MinimumReturn | 1.8e+03  |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06523001194000244
Validation loss = 0.04944107308983803
Validation loss = 0.049494002014398575
Validation loss = 0.05598173663020134
Validation loss = 0.05193871632218361
Validation loss = 0.048964597284793854
Validation loss = 0.05282306298613548
Validation loss = 0.05033627524971962
Validation loss = 0.0475284568965435
Validation loss = 0.055393461138010025
Validation loss = 0.05028257519006729
Validation loss = 0.048583678901195526
Validation loss = 0.05376875400543213
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.058504633605480194
Validation loss = 0.046990182250738144
Validation loss = 0.04766761511564255
Validation loss = 0.05223753675818443
Validation loss = 0.04809492081403732
Validation loss = 0.047238148748874664
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05435745045542717
Validation loss = 0.04746237024664879
Validation loss = 0.05424492433667183
Validation loss = 0.048019733279943466
Validation loss = 0.04855196550488472
Validation loss = 0.058990489691495895
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06056458503007889
Validation loss = 0.04682588204741478
Validation loss = 0.04731130599975586
Validation loss = 0.05961984395980835
Validation loss = 0.04670102149248123
Validation loss = 0.05105053633451462
Validation loss = 0.049362439662218094
Validation loss = 0.04774589091539383
Validation loss = 0.049764569848775864
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05997481569647789
Validation loss = 0.04995337128639221
Validation loss = 0.05082071200013161
Validation loss = 0.05343019217252731
Validation loss = 0.04754079133272171
Validation loss = 0.05152672529220581
Validation loss = 0.04918982461094856
Validation loss = 0.04789281636476517
Validation loss = 0.05170769244432449
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 1.25e+03  |
| Iteration     | 27        |
| MaximumReturn | 2.37e+03  |
| MinimumReturn | -2.13e+03 |
| TotalSamples  | 116000    |
-----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0598159059882164
Validation loss = 0.05515812337398529
Validation loss = 0.04820172116160393
Validation loss = 0.049072351306676865
Validation loss = 0.053999338299036026
Validation loss = 0.05059126764535904
Validation loss = 0.06421814858913422
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06721353530883789
Validation loss = 0.050599902868270874
Validation loss = 0.04855328053236008
Validation loss = 0.05429782718420029
Validation loss = 0.050270047038793564
Validation loss = 0.04996496066451073
Validation loss = 0.05170260742306709
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0583866611123085
Validation loss = 0.0470675528049469
Validation loss = 0.06071478873491287
Validation loss = 0.04825173318386078
Validation loss = 0.04633569344878197
Validation loss = 0.051695194095373154
Validation loss = 0.049862075597047806
Validation loss = 0.04779209941625595
Validation loss = 0.04964036867022514
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06547252088785172
Validation loss = 0.04880812019109726
Validation loss = 0.04532991349697113
Validation loss = 0.05871812626719475
Validation loss = 0.04627515375614166
Validation loss = 0.04832163080573082
Validation loss = 0.05797826871275902
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06163831055164337
Validation loss = 0.04735306650400162
Validation loss = 0.04721318557858467
Validation loss = 0.05715727433562279
Validation loss = 0.047867484390735626
Validation loss = 0.04897981509566307
Validation loss = 0.05811365693807602
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.03e+03 |
| Iteration     | 28       |
| MaximumReturn | 2.32e+03 |
| MinimumReturn | 1.67e+03 |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06916658580303192
Validation loss = 0.04902174323797226
Validation loss = 0.049141775816679
Validation loss = 0.05395307019352913
Validation loss = 0.052870433777570724
Validation loss = 0.048309531062841415
Validation loss = 0.05256050452589989
Validation loss = 0.04937882348895073
Validation loss = 0.048766303807497025
Validation loss = 0.05190176144242287
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05526399612426758
Validation loss = 0.049756404012441635
Validation loss = 0.046862855553627014
Validation loss = 0.052153948694467545
Validation loss = 0.04655497148633003
Validation loss = 0.048091821372509
Validation loss = 0.046455658972263336
Validation loss = 0.05280366912484169
Validation loss = 0.045422352850437164
Validation loss = 0.05191315338015556
Validation loss = 0.04679148271679878
Validation loss = 0.04634396731853485
Validation loss = 0.05764823406934738
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0624767504632473
Validation loss = 0.04782630503177643
Validation loss = 0.05159950256347656
Validation loss = 0.04805034399032593
Validation loss = 0.04704898223280907
Validation loss = 0.05289421230554581
Validation loss = 0.047272369265556335
Validation loss = 0.05084605515003204
Validation loss = 0.056824665516614914
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0581151507794857
Validation loss = 0.0461762472987175
Validation loss = 0.052667416632175446
Validation loss = 0.046842802315950394
Validation loss = 0.060165878385305405
Validation loss = 0.04587418958544731
Validation loss = 0.05626165121793747
Validation loss = 0.0480499230325222
Validation loss = 0.04567752033472061
Validation loss = 0.04630798101425171
Validation loss = 0.05342761427164078
Validation loss = 0.04506298154592514
Validation loss = 0.048349495977163315
Validation loss = 0.047878868877887726
Validation loss = 0.04541880264878273
Validation loss = 0.05186752602458
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06199509650468826
Validation loss = 0.05007913336157799
Validation loss = 0.04802040755748749
Validation loss = 0.05182524770498276
Validation loss = 0.04794567450881004
Validation loss = 0.0466778539121151
Validation loss = 0.05419982597231865
Validation loss = 0.04687679931521416
Validation loss = 0.054385386407375336
Validation loss = 0.0478472039103508
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | 1.3e+03   |
| Iteration     | 29        |
| MaximumReturn | 2.36e+03  |
| MinimumReturn | -2.34e+03 |
| TotalSamples  | 124000    |
-----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.053193725645542145
Validation loss = 0.046542126685380936
Validation loss = 0.04903464391827583
Validation loss = 0.047562722116708755
Validation loss = 0.049543194472789764
Validation loss = 0.046045880764722824
Validation loss = 0.046324536204338074
Validation loss = 0.053917091339826584
Validation loss = 0.04502950236201286
Validation loss = 0.04501738026738167
Validation loss = 0.050678420811891556
Validation loss = 0.04998467490077019
Validation loss = 0.049964357167482376
Validation loss = 0.0459018312394619
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06658631563186646
Validation loss = 0.04538340121507645
Validation loss = 0.04645664617419243
Validation loss = 0.046112511307001114
Validation loss = 0.048190049827098846
Validation loss = 0.04407624155282974
Validation loss = 0.05072789266705513
Validation loss = 0.044429391622543335
Validation loss = 0.04304517060518265
Validation loss = 0.051829058676958084
Validation loss = 0.04470404237508774
Validation loss = 0.04400210082530975
Validation loss = 0.04649464040994644
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06647118926048279
Validation loss = 0.04749436676502228
Validation loss = 0.048013683408498764
Validation loss = 0.05358521640300751
Validation loss = 0.047347959131002426
Validation loss = 0.04939296841621399
Validation loss = 0.05008392035961151
Validation loss = 0.04615040868520737
Validation loss = 0.046486109495162964
Validation loss = 0.053740888833999634
Validation loss = 0.04758749529719353
Validation loss = 0.04689536988735199
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.060008883476257324
Validation loss = 0.044333260506391525
Validation loss = 0.04583771899342537
Validation loss = 0.046543680131435394
Validation loss = 0.04539688676595688
Validation loss = 0.05054270103573799
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05904597416520119
Validation loss = 0.04652244225144386
Validation loss = 0.04880236089229584
Validation loss = 0.04849237948656082
Validation loss = 0.04592645913362503
Validation loss = 0.056082457304000854
Validation loss = 0.04487261176109314
Validation loss = 0.05810876563191414
Validation loss = 0.04620159789919853
Validation loss = 0.04637334495782852
Validation loss = 0.05762398615479469
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.37e+03 |
| Iteration     | 30       |
| MaximumReturn | 2.66e+03 |
| MinimumReturn | 178      |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05815161392092705
Validation loss = 0.046706534922122955
Validation loss = 0.04673679918050766
Validation loss = 0.0480373315513134
Validation loss = 0.05890285223722458
Validation loss = 0.04711122065782547
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.056651268154382706
Validation loss = 0.045077938586473465
Validation loss = 0.046411655843257904
Validation loss = 0.04678306728601456
Validation loss = 0.045525968074798584
Validation loss = 0.0438138023018837
Validation loss = 0.0558459609746933
Validation loss = 0.043804652988910675
Validation loss = 0.04395981878042221
Validation loss = 0.046409256756305695
Validation loss = 0.04333432763814926
Validation loss = 0.054865360260009766
Validation loss = 0.04363904893398285
Validation loss = 0.043167922645807266
Validation loss = 0.05028514564037323
Validation loss = 0.04425331577658653
Validation loss = 0.04274787753820419
Validation loss = 0.047891341149806976
Validation loss = 0.046571291983127594
Validation loss = 0.04226665943861008
Validation loss = 0.05593235045671463
Validation loss = 0.04345519840717316
Validation loss = 0.045613985508680344
Validation loss = 0.05302262306213379
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05599284544587135
Validation loss = 0.047499749809503555
Validation loss = 0.04652993008494377
Validation loss = 0.06685729324817657
Validation loss = 0.045750781893730164
Validation loss = 0.04842419922351837
Validation loss = 0.049036599695682526
Validation loss = 0.04692788049578667
Validation loss = 0.06453389674425125
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05329900234937668
Validation loss = 0.047120265662670135
Validation loss = 0.04570920765399933
Validation loss = 0.05436891317367554
Validation loss = 0.04492940381169319
Validation loss = 0.04527779668569565
Validation loss = 0.06620452553033829
Validation loss = 0.044409386813640594
Validation loss = 0.0442705973982811
Validation loss = 0.05208263546228409
Validation loss = 0.04368492215871811
Validation loss = 0.04888323321938515
Validation loss = 0.04589690640568733
Validation loss = 0.04349333047866821
Validation loss = 0.05334899574518204
Validation loss = 0.043867327272892
Validation loss = 0.04567970335483551
Validation loss = 0.046993035823106766
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05687981843948364
Validation loss = 0.04643302783370018
Validation loss = 0.04734506458044052
Validation loss = 0.049550916999578476
Validation loss = 0.0464569590985775
Validation loss = 0.045510873198509216
Validation loss = 0.05520413815975189
Validation loss = 0.04426349699497223
Validation loss = 0.04580217972397804
Validation loss = 0.04784605652093887
Validation loss = 0.04470612108707428
Validation loss = 0.04972602427005768
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.76e+03 |
| Iteration     | 31       |
| MaximumReturn | 2.44e+03 |
| MinimumReturn | 231      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04989177733659744
Validation loss = 0.04699612781405449
Validation loss = 0.04791324585676193
Validation loss = 0.0463353656232357
Validation loss = 0.05567101761698723
Validation loss = 0.046252310276031494
Validation loss = 0.046733200550079346
Validation loss = 0.05641074851155281
Validation loss = 0.04522794112563133
Validation loss = 0.045882999897003174
Validation loss = 0.04669550433754921
Validation loss = 0.04432599991559982
Validation loss = 0.05787584185600281
Validation loss = 0.04573379084467888
Validation loss = 0.045773785561323166
Validation loss = 0.044979944825172424
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05104775354266167
Validation loss = 0.042006559669971466
Validation loss = 0.04354187101125717
Validation loss = 0.04574412852525711
Validation loss = 0.04132730886340141
Validation loss = 0.046354468911886215
Validation loss = 0.04127955809235573
Validation loss = 0.04569672420620918
Validation loss = 0.041406285017728806
Validation loss = 0.04694914445281029
Validation loss = 0.04494022578001022
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06289408355951309
Validation loss = 0.04779253900051117
Validation loss = 0.052201755344867706
Validation loss = 0.04742785543203354
Validation loss = 0.048230670392513275
Validation loss = 0.053660400211811066
Validation loss = 0.0475417897105217
Validation loss = 0.0469757616519928
Validation loss = 0.05119824782013893
Validation loss = 0.04641792178153992
Validation loss = 0.058403778821229935
Validation loss = 0.04777141660451889
Validation loss = 0.05008786916732788
Validation loss = 0.04769686982035637
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05168509483337402
Validation loss = 0.043050795793533325
Validation loss = 0.04655785486102104
Validation loss = 0.04643642157316208
Validation loss = 0.04417914152145386
Validation loss = 0.04279549419879913
Validation loss = 0.04822259396314621
Validation loss = 0.04315454512834549
Validation loss = 0.04362700134515762
Validation loss = 0.04683581367135048
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05348283052444458
Validation loss = 0.04407238960266113
Validation loss = 0.05061127245426178
Validation loss = 0.043801289051771164
Validation loss = 0.0511370450258255
Validation loss = 0.044346991926431656
Validation loss = 0.043321214616298676
Validation loss = 0.04777701944112778
Validation loss = 0.042805351316928864
Validation loss = 0.04593215882778168
Validation loss = 0.04783725365996361
Validation loss = 0.043104369193315506
Validation loss = 0.051734842360019684
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 2.33e+03 |
| Iteration     | 32       |
| MaximumReturn | 2.63e+03 |
| MinimumReturn | 2.02e+03 |
| TotalSamples  | 136000   |
----------------------------
