Logging to experiments/hopper/hopperA01/Mon-31-Oct-2022-11-00-29-AM-CDT_hopper_trpo_iteration_20_seed2531
Print configuration .....
{'env_name': 'hopper', 'random_seeds': [1234, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/hopper_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5785059928894043
Validation loss = 0.29769861698150635
Validation loss = 0.291576623916626
Validation loss = 0.2819058895111084
Validation loss = 0.2875601351261139
Validation loss = 0.2960647642612457
Validation loss = 0.3083367943763733
Validation loss = 0.3052298426628113
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.73105788230896
Validation loss = 0.30096757411956787
Validation loss = 0.2887251377105713
Validation loss = 0.28154200315475464
Validation loss = 0.28749868273735046
Validation loss = 0.2950500249862671
Validation loss = 0.3215317130088806
Validation loss = 0.31008821725845337
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.6676373481750488
Validation loss = 0.30400556325912476
Validation loss = 0.28420546650886536
Validation loss = 0.28172415494918823
Validation loss = 0.2944948375225067
Validation loss = 0.31412273645401
Validation loss = 0.3128893971443176
Validation loss = 0.3050212264060974
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.7205032110214233
Validation loss = 0.3021289110183716
Validation loss = 0.2832612991333008
Validation loss = 0.28982800245285034
Validation loss = 0.2869030237197876
Validation loss = 0.3161696195602417
Validation loss = 0.30348271131515503
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.7827897071838379
Validation loss = 0.2985198199748993
Validation loss = 0.28688105940818787
Validation loss = 0.2853730022907257
Validation loss = 0.3013060390949249
Validation loss = 0.3037019371986389
Validation loss = 0.3070700168609619
Validation loss = 0.3291351795196533
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.35e+03 |
| Iteration     | 0         |
| MaximumReturn | -1.02e+03 |
| MinimumReturn | -1.6e+03  |
| TotalSamples  | 8000      |
-----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.42575788497924805
Validation loss = 0.4037507176399231
Validation loss = 0.4236726462841034
Validation loss = 0.43531516194343567
Validation loss = 0.4819192886352539
Validation loss = 0.48792576789855957
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.4151567816734314
Validation loss = 0.4322044849395752
Validation loss = 0.4360524117946625
Validation loss = 0.4551648497581482
Validation loss = 0.47059375047683716
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.42141199111938477
Validation loss = 0.4125903248786926
Validation loss = 0.4184698164463043
Validation loss = 0.42950624227523804
Validation loss = 0.48660826683044434
Validation loss = 0.47414630651474
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.4128362536430359
Validation loss = 0.4304813742637634
Validation loss = 0.44800665974617004
Validation loss = 0.47160717844963074
Validation loss = 0.5097441673278809
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.41171199083328247
Validation loss = 0.4104260802268982
Validation loss = 0.4155658781528473
Validation loss = 0.45377349853515625
Validation loss = 0.440719872713089
Validation loss = 0.5329362154006958
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.02e+03 |
| Iteration     | 1         |
| MaximumReturn | -1.16e+03 |
| MinimumReturn | -2.32e+03 |
| TotalSamples  | 12000     |
-----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3314869701862335
Validation loss = 0.37469765543937683
Validation loss = 0.38630327582359314
Validation loss = 0.3744378983974457
Validation loss = 0.36014318466186523
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3384194076061249
Validation loss = 0.38843634724617004
Validation loss = 0.3880087435245514
Validation loss = 0.3825744390487671
Validation loss = 0.3860851526260376
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.3480820953845978
Validation loss = 0.36042988300323486
Validation loss = 0.38401636481285095
Validation loss = 0.3751177489757538
Validation loss = 0.3751643896102905
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.38845524191856384
Validation loss = 0.37612712383270264
Validation loss = 0.3906780481338501
Validation loss = 0.3800002336502075
Validation loss = 0.38874971866607666
Validation loss = 0.38782060146331787
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3439832627773285
Validation loss = 0.3759438693523407
Validation loss = 0.3938356637954712
Validation loss = 0.39424049854278564
Validation loss = 0.3829987943172455
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.62e+03 |
| Iteration     | 2         |
| MaximumReturn | -91.4     |
| MinimumReturn | -2.85e+03 |
| TotalSamples  | 16000     |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.3618618845939636
Validation loss = 0.3571194112300873
Validation loss = 0.32495778799057007
Validation loss = 0.3186632990837097
Validation loss = 0.3327447175979614
Validation loss = 0.3175128102302551
Validation loss = 0.3107181787490845
Validation loss = 0.3201811909675598
Validation loss = 0.32930636405944824
Validation loss = 0.31910574436187744
Validation loss = 0.31475481390953064
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.3643125891685486
Validation loss = 0.369568407535553
Validation loss = 0.36403942108154297
Validation loss = 0.359441339969635
Validation loss = 0.3447158932685852
Validation loss = 0.35244232416152954
Validation loss = 0.33896511793136597
Validation loss = 0.3451286554336548
Validation loss = 0.34042805433273315
Validation loss = 0.3390662670135498
Validation loss = 0.3323960304260254
Validation loss = 0.3394772410392761
Validation loss = 0.3377978801727295
Validation loss = 0.3325475752353668
Validation loss = 0.3205219507217407
Validation loss = 0.3268469572067261
Validation loss = 0.33105403184890747
Validation loss = 0.31704866886138916
Validation loss = 0.32380038499832153
Validation loss = 0.3322576880455017
Validation loss = 0.3231889605522156
Validation loss = 0.3152080774307251
Validation loss = 0.3157597780227661
Validation loss = 0.31635427474975586
Validation loss = 0.31966739892959595
Validation loss = 0.31914234161376953
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.36889222264289856
Validation loss = 0.3451892137527466
Validation loss = 0.3364478647708893
Validation loss = 0.331005722284317
Validation loss = 0.3441099524497986
Validation loss = 0.33812564611434937
Validation loss = 0.3227943778038025
Validation loss = 0.3216434717178345
Validation loss = 0.3135003447532654
Validation loss = 0.33979350328445435
Validation loss = 0.3066178262233734
Validation loss = 0.3101438879966736
Validation loss = 0.32486164569854736
Validation loss = 0.31148505210876465
Validation loss = 0.31775349378585815
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3740631639957428
Validation loss = 0.353354811668396
Validation loss = 0.350140780210495
Validation loss = 0.34711581468582153
Validation loss = 0.34189772605895996
Validation loss = 0.33557361364364624
Validation loss = 0.3428567051887512
Validation loss = 0.33066755533218384
Validation loss = 0.3286499083042145
Validation loss = 0.3326660394668579
Validation loss = 0.3220869302749634
Validation loss = 0.3353974223136902
Validation loss = 0.3260451555252075
Validation loss = 0.34622830152511597
Validation loss = 0.33083075284957886
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.3701251149177551
Validation loss = 0.36144018173217773
Validation loss = 0.3424539566040039
Validation loss = 0.33046960830688477
Validation loss = 0.32987815141677856
Validation loss = 0.3299805819988251
Validation loss = 0.33129096031188965
Validation loss = 0.3178984522819519
Validation loss = 0.3367540240287781
Validation loss = 0.3250858783721924
Validation loss = 0.3278873860836029
Validation loss = 0.3208218216896057
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.34e+03 |
| Iteration     | 3         |
| MaximumReturn | -178      |
| MinimumReturn | -2.36e+03 |
| TotalSamples  | 20000     |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.32073134183883667
Validation loss = 0.2937861979007721
Validation loss = 0.3023112714290619
Validation loss = 0.3082960247993469
Validation loss = 0.29267245531082153
Validation loss = 0.3052186071872711
Validation loss = 0.2983071506023407
Validation loss = 0.30484718084335327
Validation loss = 0.30163484811782837
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.35404080152511597
Validation loss = 0.31122058629989624
Validation loss = 0.311053603887558
Validation loss = 0.2986451983451843
Validation loss = 0.296259343624115
Validation loss = 0.30058664083480835
Validation loss = 0.2998817563056946
Validation loss = 0.29637610912323
Validation loss = 0.287308931350708
Validation loss = 0.2993784546852112
Validation loss = 0.3064073920249939
Validation loss = 0.29767757654190063
Validation loss = 0.29315394163131714
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.344597190618515
Validation loss = 0.3056481182575226
Validation loss = 0.2892294228076935
Validation loss = 0.29366299510002136
Validation loss = 0.29099875688552856
Validation loss = 0.28728586435317993
Validation loss = 0.2824579179286957
Validation loss = 0.279075562953949
Validation loss = 0.28557872772216797
Validation loss = 0.2830638587474823
Validation loss = 0.2837567925453186
Validation loss = 0.2823977470397949
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.3252861797809601
Validation loss = 0.308159738779068
Validation loss = 0.30968785285949707
Validation loss = 0.30523425340652466
Validation loss = 0.2958309054374695
Validation loss = 0.29400089383125305
Validation loss = 0.29523393511772156
Validation loss = 0.30052539706230164
Validation loss = 0.2989026606082916
Validation loss = 0.30323705077171326
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.33124658465385437
Validation loss = 0.31524282693862915
Validation loss = 0.3187267780303955
Validation loss = 0.308066189289093
Validation loss = 0.3045411705970764
Validation loss = 0.3002608120441437
Validation loss = 0.3055734634399414
Validation loss = 0.3154892325401306
Validation loss = 0.30959850549697876
Validation loss = 0.30207255482673645
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.87e+03 |
| Iteration     | 4         |
| MaximumReturn | -1.58e+03 |
| MinimumReturn | -2.22e+03 |
| TotalSamples  | 24000     |
-----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.25523480772972107
Validation loss = 0.24751168489456177
Validation loss = 0.2509762644767761
Validation loss = 0.2397870421409607
Validation loss = 0.23507951200008392
Validation loss = 0.2353929728269577
Validation loss = 0.24682174623012543
Validation loss = 0.23658376932144165
Validation loss = 0.2448962777853012
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2598169147968292
Validation loss = 0.22752369940280914
Validation loss = 0.22810842096805573
Validation loss = 0.21749098598957062
Validation loss = 0.2119661569595337
Validation loss = 0.21192879974842072
Validation loss = 0.2135763019323349
Validation loss = 0.22421176731586456
Validation loss = 0.24729831516742706
Validation loss = 0.22461993992328644
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.26824164390563965
Validation loss = 0.23368452489376068
Validation loss = 0.2222006916999817
Validation loss = 0.21553672850131989
Validation loss = 0.22099298238754272
Validation loss = 0.21666862070560455
Validation loss = 0.21573621034622192
Validation loss = 0.21542920172214508
Validation loss = 0.22435958683490753
Validation loss = 0.2225094884634018
Validation loss = 0.22390611469745636
Validation loss = 0.21488171815872192
Validation loss = 0.21793381869792938
Validation loss = 0.22253476083278656
Validation loss = 0.22489698231220245
Validation loss = 0.2103668451309204
Validation loss = 0.21048255264759064
Validation loss = 0.2166004180908203
Validation loss = 0.22969920933246613
Validation loss = 0.22549112141132355
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2548920512199402
Validation loss = 0.24330872297286987
Validation loss = 0.23751968145370483
Validation loss = 0.22578053176403046
Validation loss = 0.22243748605251312
Validation loss = 0.22728438675403595
Validation loss = 0.22237388789653778
Validation loss = 0.22213749587535858
Validation loss = 0.22679752111434937
Validation loss = 0.21470774710178375
Validation loss = 0.2154928594827652
Validation loss = 0.21734501421451569
Validation loss = 0.237374946475029
Validation loss = 0.214593768119812
Validation loss = 0.21307222545146942
Validation loss = 0.21275927126407623
Validation loss = 0.22927135229110718
Validation loss = 0.21899829804897308
Validation loss = 0.20997397601604462
Validation loss = 0.2249377816915512
Validation loss = 0.20441430807113647
Validation loss = 0.20660872757434845
Validation loss = 0.20628118515014648
Validation loss = 0.20518441498279572
Validation loss = 0.21298809349536896
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.27048400044441223
Validation loss = 0.2554697096347809
Validation loss = 0.23733913898468018
Validation loss = 0.23996900022029877
Validation loss = 0.2366260141134262
Validation loss = 0.23743951320648193
Validation loss = 0.2335302233695984
Validation loss = 0.23974542319774628
Validation loss = 0.2409520149230957
Validation loss = 0.2366190403699875
Validation loss = 0.22613917291164398
Validation loss = 0.22961652278900146
Validation loss = 0.22679664194583893
Validation loss = 0.2348315566778183
Validation loss = 0.2293098419904709
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.3e+03  |
| Iteration     | 5         |
| MaximumReturn | -57.5     |
| MinimumReturn | -2.53e+03 |
| TotalSamples  | 28000     |
-----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2499590665102005
Validation loss = 0.22877733409404755
Validation loss = 0.22804126143455505
Validation loss = 0.21472500264644623
Validation loss = 0.21339412033557892
Validation loss = 0.21720942854881287
Validation loss = 0.21462145447731018
Validation loss = 0.214347705245018
Validation loss = 0.2202720195055008
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.22392164170742035
Validation loss = 0.22066693007946014
Validation loss = 0.20038127899169922
Validation loss = 0.19330255687236786
Validation loss = 0.19943027198314667
Validation loss = 0.20913445949554443
Validation loss = 0.20164404809474945
Validation loss = 0.2004365622997284
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.22895248234272003
Validation loss = 0.20286895334720612
Validation loss = 0.19471274316310883
Validation loss = 0.19853894412517548
Validation loss = 0.20012150704860687
Validation loss = 0.2046973705291748
Validation loss = 0.20634625852108002
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.23909321427345276
Validation loss = 0.20524828135967255
Validation loss = 0.18950875103473663
Validation loss = 0.1908147782087326
Validation loss = 0.18876519799232483
Validation loss = 0.19137096405029297
Validation loss = 0.1926267445087433
Validation loss = 0.18617534637451172
Validation loss = 0.18965156376361847
Validation loss = 0.2058192640542984
Validation loss = 0.20125821232795715
Validation loss = 0.18954983353614807
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.21709798276424408
Validation loss = 0.2236473560333252
Validation loss = 0.20761261880397797
Validation loss = 0.21081635355949402
Validation loss = 0.21124252676963806
Validation loss = 0.21093104779720306
Validation loss = 0.2103838175535202
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.38e+03 |
| Iteration     | 6         |
| MaximumReturn | -634      |
| MinimumReturn | -2.41e+03 |
| TotalSamples  | 32000     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.2084229290485382
Validation loss = 0.18017132580280304
Validation loss = 0.18557877838611603
Validation loss = 0.17870822548866272
Validation loss = 0.19180235266685486
Validation loss = 0.1777951866388321
Validation loss = 0.18344023823738098
Validation loss = 0.17277473211288452
Validation loss = 0.1777718961238861
Validation loss = 0.17354361712932587
Validation loss = 0.1786361038684845
Validation loss = 0.17858144640922546
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.19453787803649902
Validation loss = 0.17979766428470612
Validation loss = 0.17417097091674805
Validation loss = 0.17042703926563263
Validation loss = 0.1668875366449356
Validation loss = 0.16274121403694153
Validation loss = 0.17567911744117737
Validation loss = 0.17992475628852844
Validation loss = 0.17531082034111023
Validation loss = 0.16336414217948914
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.18983349204063416
Validation loss = 0.18273881077766418
Validation loss = 0.16772910952568054
Validation loss = 0.16360634565353394
Validation loss = 0.17051124572753906
Validation loss = 0.15946078300476074
Validation loss = 0.1644696295261383
Validation loss = 0.16658231616020203
Validation loss = 0.15986894071102142
Validation loss = 0.16248920559883118
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2015455812215805
Validation loss = 0.1768871247768402
Validation loss = 0.17342081665992737
Validation loss = 0.16997820138931274
Validation loss = 0.17368778586387634
Validation loss = 0.16186636686325073
Validation loss = 0.16870129108428955
Validation loss = 0.17504674196243286
Validation loss = 0.17651313543319702
Validation loss = 0.16368815302848816
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.21356646716594696
Validation loss = 0.19515159726142883
Validation loss = 0.17958447337150574
Validation loss = 0.17153413593769073
Validation loss = 0.17239165306091309
Validation loss = 0.17757174372673035
Validation loss = 0.1705365777015686
Validation loss = 0.17098084092140198
Validation loss = 0.17541784048080444
Validation loss = 0.17829111218452454
Validation loss = 0.16836503148078918
Validation loss = 0.162266805768013
Validation loss = 0.1629389524459839
Validation loss = 0.1626666933298111
Validation loss = 0.16754460334777832
Validation loss = 0.1828136444091797
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.22e+03 |
| Iteration     | 7         |
| MaximumReturn | 61.7      |
| MinimumReturn | -2.39e+03 |
| TotalSamples  | 36000     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.18230202794075012
Validation loss = 0.16461043059825897
Validation loss = 0.16530215740203857
Validation loss = 0.15756237506866455
Validation loss = 0.15815496444702148
Validation loss = 0.15788348019123077
Validation loss = 0.15358209609985352
Validation loss = 0.1477203071117401
Validation loss = 0.1471744328737259
Validation loss = 0.1539960503578186
Validation loss = 0.14958827197551727
Validation loss = 0.15131358802318573
Validation loss = 0.1551283895969391
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16293399035930634
Validation loss = 0.17054063081741333
Validation loss = 0.16015540063381195
Validation loss = 0.15523220598697662
Validation loss = 0.16327005624771118
Validation loss = 0.17852358520030975
Validation loss = 0.17599867284297943
Validation loss = 0.15913206338882446
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.17064088582992554
Validation loss = 0.16115954518318176
Validation loss = 0.14893551170825958
Validation loss = 0.14631859958171844
Validation loss = 0.15198098123073578
Validation loss = 0.1549992859363556
Validation loss = 0.1505928337574005
Validation loss = 0.15610085427761078
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17366540431976318
Validation loss = 0.16572338342666626
Validation loss = 0.15530747175216675
Validation loss = 0.153564915060997
Validation loss = 0.15659165382385254
Validation loss = 0.15464310348033905
Validation loss = 0.15669028460979462
Validation loss = 0.15471726655960083
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.17900285124778748
Validation loss = 0.16210344433784485
Validation loss = 0.15160267055034637
Validation loss = 0.15324190258979797
Validation loss = 0.15088438987731934
Validation loss = 0.15094715356826782
Validation loss = 0.1474665403366089
Validation loss = 0.15305930376052856
Validation loss = 0.18209098279476166
Validation loss = 0.15789921581745148
Validation loss = 0.15168225765228271
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.43e+03 |
| Iteration     | 8         |
| MaximumReturn | -900      |
| MinimumReturn | -2.39e+03 |
| TotalSamples  | 40000     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16746336221694946
Validation loss = 0.1448131501674652
Validation loss = 0.13202513754367828
Validation loss = 0.1320684254169464
Validation loss = 0.12927016615867615
Validation loss = 0.13693830370903015
Validation loss = 0.14299353957176208
Validation loss = 0.14387226104736328
Validation loss = 0.13052067160606384
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1615104377269745
Validation loss = 0.15965576469898224
Validation loss = 0.14795395731925964
Validation loss = 0.14940088987350464
Validation loss = 0.15151116251945496
Validation loss = 0.14979788661003113
Validation loss = 0.1524014174938202
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1560346931219101
Validation loss = 0.13836336135864258
Validation loss = 0.14014430344104767
Validation loss = 0.13805219531059265
Validation loss = 0.13896577060222626
Validation loss = 0.13909384608268738
Validation loss = 0.14153936505317688
Validation loss = 0.1436489373445511
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17158566415309906
Validation loss = 0.15209802985191345
Validation loss = 0.14508198201656342
Validation loss = 0.14227525889873505
Validation loss = 0.14706256985664368
Validation loss = 0.14727239310741425
Validation loss = 0.15134432911872864
Validation loss = 0.14831969141960144
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1491016447544098
Validation loss = 0.14193004369735718
Validation loss = 0.13692490756511688
Validation loss = 0.12854129076004028
Validation loss = 0.12806175649166107
Validation loss = 0.13325129449367523
Validation loss = 0.13414303958415985
Validation loss = 0.1292533576488495
Validation loss = 0.13194039463996887
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.61e+03 |
| Iteration     | 9         |
| MaximumReturn | -2.28e+03 |
| MinimumReturn | -2.78e+03 |
| TotalSamples  | 44000     |
-----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.17832498252391815
Validation loss = 0.1299387812614441
Validation loss = 0.12930744886398315
Validation loss = 0.13537730276584625
Validation loss = 0.1327497363090515
Validation loss = 0.1388615071773529
Validation loss = 0.13632164895534515
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1653081774711609
Validation loss = 0.15609069168567657
Validation loss = 0.14538981020450592
Validation loss = 0.1364443153142929
Validation loss = 0.1516803354024887
Validation loss = 0.15376652777194977
Validation loss = 0.14693917334079742
Validation loss = 0.1350240260362625
Validation loss = 0.13516345620155334
Validation loss = 0.13265615701675415
Validation loss = 0.13470615446567535
Validation loss = 0.14348797500133514
Validation loss = 0.14888368546962738
Validation loss = 0.1384153813123703
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16604843735694885
Validation loss = 0.1495721936225891
Validation loss = 0.14225970208644867
Validation loss = 0.13534551858901978
Validation loss = 0.1370360404253006
Validation loss = 0.15214397013187408
Validation loss = 0.137673482298851
Validation loss = 0.1399391144514084
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1444838047027588
Validation loss = 0.1454097479581833
Validation loss = 0.14213012158870697
Validation loss = 0.13846144080162048
Validation loss = 0.15589497983455658
Validation loss = 0.14645488560199738
Validation loss = 0.14322315156459808
Validation loss = 0.15073107182979584
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1437695473432541
Validation loss = 0.14577992260456085
Validation loss = 0.123580701649189
Validation loss = 0.12103064358234406
Validation loss = 0.12097778916358948
Validation loss = 0.1271679401397705
Validation loss = 0.14361077547073364
Validation loss = 0.13330024480819702
Validation loss = 0.12114627659320831
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.64e+03 |
| Iteration     | 10        |
| MaximumReturn | -574      |
| MinimumReturn | -2.71e+03 |
| TotalSamples  | 48000     |
-----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14089040458202362
Validation loss = 0.120411217212677
Validation loss = 0.1201147511601448
Validation loss = 0.12134188413619995
Validation loss = 0.12161803245544434
Validation loss = 0.141635462641716
Validation loss = 0.11758669465780258
Validation loss = 0.11800260096788406
Validation loss = 0.11762716621160507
Validation loss = 0.12284162640571594
Validation loss = 0.1366468220949173
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15441404283046722
Validation loss = 0.13055427372455597
Validation loss = 0.1296529620885849
Validation loss = 0.12178244441747665
Validation loss = 0.12803010642528534
Validation loss = 0.13434170186519623
Validation loss = 0.1262335330247879
Validation loss = 0.12475106865167618
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16128665208816528
Validation loss = 0.13307088613510132
Validation loss = 0.13327045738697052
Validation loss = 0.12437451630830765
Validation loss = 0.13290004432201385
Validation loss = 0.12950557470321655
Validation loss = 0.1408432275056839
Validation loss = 0.13827383518218994
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.151222363114357
Validation loss = 0.13318373262882233
Validation loss = 0.12970668077468872
Validation loss = 0.12833531200885773
Validation loss = 0.13493692874908447
Validation loss = 0.12988311052322388
Validation loss = 0.13736863434314728
Validation loss = 0.13048061728477478
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13855062425136566
Validation loss = 0.12402617931365967
Validation loss = 0.12009944766759872
Validation loss = 0.11735103279352188
Validation loss = 0.11653558164834976
Validation loss = 0.11596405506134033
Validation loss = 0.13438332080841064
Validation loss = 0.12183935195207596
Validation loss = 0.11456970125436783
Validation loss = 0.11681779474020004
Validation loss = 0.12016098946332932
Validation loss = 0.12182033807039261
Validation loss = 0.11775609105825424
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.4e+03  |
| Iteration     | 11        |
| MaximumReturn | -193      |
| MinimumReturn | -2.73e+03 |
| TotalSamples  | 52000     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.14158591628074646
Validation loss = 0.137123703956604
Validation loss = 0.12435252964496613
Validation loss = 0.12457168847322464
Validation loss = 0.1310945302248001
Validation loss = 0.1321161389350891
Validation loss = 0.12074914574623108
Validation loss = 0.11838214844465256
Validation loss = 0.11950789391994476
Validation loss = 0.12792855501174927
Validation loss = 0.13195055723190308
Validation loss = 0.11758359521627426
Validation loss = 0.11428176611661911
Validation loss = 0.13999606668949127
Validation loss = 0.12221015989780426
Validation loss = 0.11676772683858871
Validation loss = 0.1162848025560379
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1469348967075348
Validation loss = 0.13618144392967224
Validation loss = 0.13219764828681946
Validation loss = 0.13104183971881866
Validation loss = 0.12802723050117493
Validation loss = 0.12415429204702377
Validation loss = 0.14072181284427643
Validation loss = 0.12576429545879364
Validation loss = 0.13265462219715118
Validation loss = 0.12490148842334747
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15009571611881256
Validation loss = 0.13255952298641205
Validation loss = 0.12444977462291718
Validation loss = 0.13324420154094696
Validation loss = 0.12754009664058685
Validation loss = 0.13385358452796936
Validation loss = 0.14266762137413025
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1596914380788803
Validation loss = 0.14463351666927338
Validation loss = 0.13020506501197815
Validation loss = 0.12841583788394928
Validation loss = 0.13258624076843262
Validation loss = 0.15037180483341217
Validation loss = 0.13616445660591125
Validation loss = 0.13148528337478638
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14598453044891357
Validation loss = 0.115958571434021
Validation loss = 0.11387213319540024
Validation loss = 0.11252588778734207
Validation loss = 0.11544375866651535
Validation loss = 0.12084678560495377
Validation loss = 0.12371280789375305
Validation loss = 0.11435487866401672
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.64e+03 |
| Iteration     | 12        |
| MaximumReturn | -596      |
| MinimumReturn | -2.71e+03 |
| TotalSamples  | 56000     |
-----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.17745499312877655
Validation loss = 0.13602331280708313
Validation loss = 0.12221679836511612
Validation loss = 0.11882267147302628
Validation loss = 0.12288306653499603
Validation loss = 0.11926780641078949
Validation loss = 0.12387480586767197
Validation loss = 0.1270657330751419
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14801867306232452
Validation loss = 0.13105431199073792
Validation loss = 0.1262567639350891
Validation loss = 0.13798901438713074
Validation loss = 0.13224662840366364
Validation loss = 0.14898641407489777
Validation loss = 0.13082219660282135
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16069035232067108
Validation loss = 0.133481964468956
Validation loss = 0.13479992747306824
Validation loss = 0.1381392925977707
Validation loss = 0.1308451145887375
Validation loss = 0.1331963837146759
Validation loss = 0.14369550347328186
Validation loss = 0.12803630530834198
Validation loss = 0.12942925095558167
Validation loss = 0.1391071081161499
Validation loss = 0.1480981558561325
Validation loss = 0.13464686274528503
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.15914908051490784
Validation loss = 0.14376381039619446
Validation loss = 0.1400572508573532
Validation loss = 0.13411687314510345
Validation loss = 0.14307452738285065
Validation loss = 0.14241692423820496
Validation loss = 0.13480030000209808
Validation loss = 0.16328532993793488
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14021776616573334
Validation loss = 0.1222996935248375
Validation loss = 0.11576028168201447
Validation loss = 0.11311600357294083
Validation loss = 0.11894845217466354
Validation loss = 0.139250248670578
Validation loss = 0.11274903267621994
Validation loss = 0.1103183776140213
Validation loss = 0.11332619190216064
Validation loss = 0.11939654499292374
Validation loss = 0.12447550147771835
Validation loss = 0.11224913597106934
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.8e+03  |
| Iteration     | 13        |
| MaximumReturn | -585      |
| MinimumReturn | -2.48e+03 |
| TotalSamples  | 60000     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.13629817962646484
Validation loss = 0.10918496549129486
Validation loss = 0.10850151628255844
Validation loss = 0.10545613616704941
Validation loss = 0.11456355452537537
Validation loss = 0.11663413792848587
Validation loss = 0.10530246794223785
Validation loss = 0.10804366320371628
Validation loss = 0.1163528636097908
Validation loss = 0.10496482253074646
Validation loss = 0.1067926213145256
Validation loss = 0.11255344748497009
Validation loss = 0.11021864414215088
Validation loss = 0.10932997614145279
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14908984303474426
Validation loss = 0.12395206838846207
Validation loss = 0.12403420358896255
Validation loss = 0.12766537070274353
Validation loss = 0.1259775012731552
Validation loss = 0.13256648182868958
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14496265351772308
Validation loss = 0.1245243027806282
Validation loss = 0.11692806333303452
Validation loss = 0.12463829666376114
Validation loss = 0.12942741811275482
Validation loss = 0.12659916281700134
Validation loss = 0.11860896646976471
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.17300152778625488
Validation loss = 0.13187816739082336
Validation loss = 0.13492456078529358
Validation loss = 0.1436154544353485
Validation loss = 0.13026513159275055
Validation loss = 0.1405654102563858
Validation loss = 0.1381920427083969
Validation loss = 0.1437401920557022
Validation loss = 0.13534659147262573
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1191570907831192
Validation loss = 0.11783130466938019
Validation loss = 0.11012481153011322
Validation loss = 0.11002559959888458
Validation loss = 0.11206755042076111
Validation loss = 0.12030357867479324
Validation loss = 0.1363603174686432
Validation loss = 0.11252995580434799
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.14e+03 |
| Iteration     | 14        |
| MaximumReturn | -177      |
| MinimumReturn | -2.3e+03  |
| TotalSamples  | 64000     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12547145783901215
Validation loss = 0.10500490665435791
Validation loss = 0.10371167957782745
Validation loss = 0.10779136419296265
Validation loss = 0.10468856245279312
Validation loss = 0.10391871631145477
Validation loss = 0.10493042320013046
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.13883398473262787
Validation loss = 0.12379899621009827
Validation loss = 0.12701311707496643
Validation loss = 0.13513046503067017
Validation loss = 0.13191913068294525
Validation loss = 0.1354496330022812
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12641924619674683
Validation loss = 0.12483292818069458
Validation loss = 0.12274755537509918
Validation loss = 0.12160661816596985
Validation loss = 0.13015085458755493
Validation loss = 0.14005640149116516
Validation loss = 0.13123194873332977
Validation loss = 0.12671861052513123
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1327163726091385
Validation loss = 0.1363990306854248
Validation loss = 0.13726091384887695
Validation loss = 0.12981396913528442
Validation loss = 0.13078540563583374
Validation loss = 0.13868160545825958
Validation loss = 0.14110513031482697
Validation loss = 0.14454539120197296
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11638178676366806
Validation loss = 0.11149857938289642
Validation loss = 0.11039300262928009
Validation loss = 0.11962050199508667
Validation loss = 0.11254027485847473
Validation loss = 0.11303570121526718
Validation loss = 0.1137004941701889
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.08e+03 |
| Iteration     | 15        |
| MaximumReturn | -662      |
| MinimumReturn | -1.44e+03 |
| TotalSamples  | 68000     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12330209463834763
Validation loss = 0.09947516024112701
Validation loss = 0.09761019051074982
Validation loss = 0.10027164965867996
Validation loss = 0.10800424218177795
Validation loss = 0.09781663864850998
Validation loss = 0.10345738381147385
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.13684365153312683
Validation loss = 0.12987667322158813
Validation loss = 0.10817751288414001
Validation loss = 0.12487732619047165
Validation loss = 0.13493862748146057
Validation loss = 0.141916885972023
Validation loss = 0.1486656218767166
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1356242150068283
Validation loss = 0.1279180496931076
Validation loss = 0.12341659516096115
Validation loss = 0.12333467602729797
Validation loss = 0.1470087319612503
Validation loss = 0.1267879456281662
Validation loss = 0.1275082379579544
Validation loss = 0.13586029410362244
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1317065805196762
Validation loss = 0.137414813041687
Validation loss = 0.13609547913074493
Validation loss = 0.13456100225448608
Validation loss = 0.1503864824771881
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1400643289089203
Validation loss = 0.10575718432664871
Validation loss = 0.10302138328552246
Validation loss = 0.10503308475017548
Validation loss = 0.10393223911523819
Validation loss = 0.11043831706047058
Validation loss = 0.11198539286851883
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.68e+03 |
| Iteration     | 16        |
| MaximumReturn | -122      |
| MinimumReturn | -2.51e+03 |
| TotalSamples  | 72000     |
-----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12226840853691101
Validation loss = 0.10183510184288025
Validation loss = 0.10072353482246399
Validation loss = 0.11305650323629379
Validation loss = 0.10987179726362228
Validation loss = 0.10142025351524353
Validation loss = 0.10760501027107239
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.13138023018836975
Validation loss = 0.13876399397850037
Validation loss = 0.15182001888751984
Validation loss = 0.14186862111091614
Validation loss = 0.15385374426841736
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.16249921917915344
Validation loss = 0.1325567662715912
Validation loss = 0.15318647027015686
Validation loss = 0.1572270393371582
Validation loss = 0.14581634104251862
Validation loss = 0.14244860410690308
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12320578098297119
Validation loss = 0.14641208946704865
Validation loss = 0.1340140849351883
Validation loss = 0.13996393978595734
Validation loss = 0.15740667283535004
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.11685130000114441
Validation loss = 0.11978641152381897
Validation loss = 0.11465958505868912
Validation loss = 0.1211039200425148
Validation loss = 0.11320912837982178
Validation loss = 0.10597104579210281
Validation loss = 0.11766200512647629
Validation loss = 0.10925444215536118
Validation loss = 0.11344289779663086
Validation loss = 0.11308372020721436
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -281     |
| Iteration     | 17       |
| MaximumReturn | 773      |
| MinimumReturn | -1.7e+03 |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11461120843887329
Validation loss = 0.09394434094429016
Validation loss = 0.10522308945655823
Validation loss = 0.09816979616880417
Validation loss = 0.09678301215171814
Validation loss = 0.09704796224832535
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1524401754140854
Validation loss = 0.13690347969532013
Validation loss = 0.13916447758674622
Validation loss = 0.13620665669441223
Validation loss = 0.14324350655078888
Validation loss = 0.14310447871685028
Validation loss = 0.16167590022087097
Validation loss = 0.16042208671569824
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13401736319065094
Validation loss = 0.13861706852912903
Validation loss = 0.14151377975940704
Validation loss = 0.13012734055519104
Validation loss = 0.15199121832847595
Validation loss = 0.12912936508655548
Validation loss = 0.14386244118213654
Validation loss = 0.15327441692352295
Validation loss = 0.1232684776186943
Validation loss = 0.15715062618255615
Validation loss = 0.15612271428108215
Validation loss = 0.14864712953567505
Validation loss = 0.17911681532859802
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1274963766336441
Validation loss = 0.12720131874084473
Validation loss = 0.1370510756969452
Validation loss = 0.1272667944431305
Validation loss = 0.14814046025276184
Validation loss = 0.13209916651248932
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13057714700698853
Validation loss = 0.11953029781579971
Validation loss = 0.11686728894710541
Validation loss = 0.11105997860431671
Validation loss = 0.13479436933994293
Validation loss = 0.1366746425628662
Validation loss = 0.11932472139596939
Validation loss = 0.11963377147912979
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.52e+03 |
| Iteration     | 18        |
| MaximumReturn | 292       |
| MinimumReturn | -2.65e+03 |
| TotalSamples  | 80000     |
-----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10768637806177139
Validation loss = 0.1023765578866005
Validation loss = 0.09886088222265244
Validation loss = 0.09645445644855499
Validation loss = 0.1050972118973732
Validation loss = 0.10723564773797989
Validation loss = 0.09712124615907669
Validation loss = 0.0989246815443039
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.16579757630825043
Validation loss = 0.17000603675842285
Validation loss = 0.17894849181175232
Validation loss = 0.15121088922023773
Validation loss = 0.1739392727613449
Validation loss = 0.14781689643859863
Validation loss = 0.1629553586244583
Validation loss = 0.19835098087787628
Validation loss = 0.18532207608222961
Validation loss = 0.23501905798912048
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13978570699691772
Validation loss = 0.15821397304534912
Validation loss = 0.16179023683071136
Validation loss = 0.1648542284965515
Validation loss = 0.1898145228624344
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1689825803041458
Validation loss = 0.1131189838051796
Validation loss = 0.14096471667289734
Validation loss = 0.133927583694458
Validation loss = 0.12682029604911804
Validation loss = 0.14279915392398834
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1256868690252304
Validation loss = 0.1258302479982376
Validation loss = 0.12277252972126007
Validation loss = 0.1389307975769043
Validation loss = 0.12468834221363068
Validation loss = 0.1408616155385971
Validation loss = 0.12472561746835709
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -940      |
| Iteration     | 19        |
| MaximumReturn | 775       |
| MinimumReturn | -2.37e+03 |
| TotalSamples  | 84000     |
-----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09722830355167389
Validation loss = 0.10317578911781311
Validation loss = 0.09991801530122757
Validation loss = 0.10229175537824631
Validation loss = 0.11243337392807007
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.13936693966388702
Validation loss = 0.11263544112443924
Validation loss = 0.13691820204257965
Validation loss = 0.1444905549287796
Validation loss = 0.11922542005777359
Validation loss = 0.14068825542926788
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11423615366220474
Validation loss = 0.1121450811624527
Validation loss = 0.11355336755514145
Validation loss = 0.13455894589424133
Validation loss = 0.14300906658172607
Validation loss = 0.13298775255680084
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11579897254705429
Validation loss = 0.117332324385643
Validation loss = 0.13560475409030914
Validation loss = 0.12594980001449585
Validation loss = 0.11999167501926422
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13288620114326477
Validation loss = 0.09951487928628922
Validation loss = 0.1093536987900734
Validation loss = 0.10617318749427795
Validation loss = 0.1086394414305687
Validation loss = 0.136556938290596
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.06e+03 |
| Iteration     | 20        |
| MaximumReturn | -107      |
| MinimumReturn | -1.83e+03 |
| TotalSamples  | 88000     |
-----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10595860332250595
Validation loss = 0.10847737640142441
Validation loss = 0.11100713908672333
Validation loss = 0.10999444127082825
Validation loss = 0.11453729122877121
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.13062797486782074
Validation loss = 0.15102419257164001
Validation loss = 0.14036522805690765
Validation loss = 0.13663695752620697
Validation loss = 0.1203947365283966
Validation loss = 0.14651602506637573
Validation loss = 0.13419869542121887
Validation loss = 0.13242225348949432
Validation loss = 0.12819184362888336
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15328456461429596
Validation loss = 0.1183331236243248
Validation loss = 0.11505108326673508
Validation loss = 0.1110462173819542
Validation loss = 0.14800196886062622
Validation loss = 0.12083368748426437
Validation loss = 0.12756557762622833
Validation loss = 0.10363888740539551
Validation loss = 0.12863324582576752
Validation loss = 0.14188013970851898
Validation loss = 0.1196359172463417
Validation loss = 0.132581889629364
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1195894181728363
Validation loss = 0.11817185580730438
Validation loss = 0.1320289522409439
Validation loss = 0.13053475320339203
Validation loss = 0.15410929918289185
Validation loss = 0.11899808049201965
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.15210658311843872
Validation loss = 0.10120022296905518
Validation loss = 0.09750407189130783
Validation loss = 0.1169717088341713
Validation loss = 0.10447664558887482
Validation loss = 0.09953060746192932
Validation loss = 0.11638272553682327
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -941     |
| Iteration     | 21       |
| MaximumReturn | -254     |
| MinimumReturn | -2.6e+03 |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11369259655475616
Validation loss = 0.10848522931337357
Validation loss = 0.10339096188545227
Validation loss = 0.09718312323093414
Validation loss = 0.1032179445028305
Validation loss = 0.08774654567241669
Validation loss = 0.10362405329942703
Validation loss = 0.10370607674121857
Validation loss = 0.10578431189060211
Validation loss = 0.12314576655626297
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1121244728565216
Validation loss = 0.10845234990119934
Validation loss = 0.1308211386203766
Validation loss = 0.11236894875764847
Validation loss = 0.09579410403966904
Validation loss = 0.11727918684482574
Validation loss = 0.11502081155776978
Validation loss = 0.13260765373706818
Validation loss = 0.10580430924892426
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11429052799940109
Validation loss = 0.1403665393590927
Validation loss = 0.12286631017923355
Validation loss = 0.10865513980388641
Validation loss = 0.1058066114783287
Validation loss = 0.11762038618326187
Validation loss = 0.1269790679216385
Validation loss = 0.10714983940124512
Validation loss = 0.13481128215789795
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11409995704889297
Validation loss = 0.1332031637430191
Validation loss = 0.15058958530426025
Validation loss = 0.10980142652988434
Validation loss = 0.11699134111404419
Validation loss = 0.12116999924182892
Validation loss = 0.12899114191532135
Validation loss = 0.16264446079730988
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10839187353849411
Validation loss = 0.08495396375656128
Validation loss = 0.09344194084405899
Validation loss = 0.09472576528787613
Validation loss = 0.092433400452137
Validation loss = 0.09041858464479446
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -917      |
| Iteration     | 22        |
| MaximumReturn | 103       |
| MinimumReturn | -2.23e+03 |
| TotalSamples  | 96000     |
-----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.102938711643219
Validation loss = 0.09904098510742188
Validation loss = 0.09693767875432968
Validation loss = 0.09507232904434204
Validation loss = 0.09780613332986832
Validation loss = 0.09214585274457932
Validation loss = 0.09592089802026749
Validation loss = 0.09782472997903824
Validation loss = 0.09121683984994888
Validation loss = 0.09103093296289444
Validation loss = 0.09276323765516281
Validation loss = 0.10433479398488998
Validation loss = 0.10258835554122925
Validation loss = 0.11501243710517883
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1132664903998375
Validation loss = 0.09650707244873047
Validation loss = 0.09780975431203842
Validation loss = 0.12034740298986435
Validation loss = 0.09632913023233414
Validation loss = 0.10470361262559891
Validation loss = 0.12567469477653503
Validation loss = 0.10650525242090225
Validation loss = 0.11292839050292969
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15149837732315063
Validation loss = 0.11264338344335556
Validation loss = 0.12715308368206024
Validation loss = 0.15597723424434662
Validation loss = 0.13436077535152435
Validation loss = 0.15692982077598572
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12763215601444244
Validation loss = 0.11662846803665161
Validation loss = 0.10299384593963623
Validation loss = 0.13375024497509003
Validation loss = 0.13781748712062836
Validation loss = 0.14565354585647583
Validation loss = 0.11405827850103378
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09546041488647461
Validation loss = 0.08515641838312149
Validation loss = 0.08397813886404037
Validation loss = 0.09012705087661743
Validation loss = 0.09215302020311356
Validation loss = 0.08680335432291031
Validation loss = 0.10719731450080872
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -951      |
| Iteration     | 23        |
| MaximumReturn | 624       |
| MinimumReturn | -2.19e+03 |
| TotalSamples  | 100000    |
-----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11604806780815125
Validation loss = 0.09330925345420837
Validation loss = 0.08743961155414581
Validation loss = 0.09607162326574326
Validation loss = 0.0949302390217781
Validation loss = 0.08866847306489944
Validation loss = 0.08611675351858139
Validation loss = 0.09736792743206024
Validation loss = 0.09360058605670929
Validation loss = 0.09233082085847855
Validation loss = 0.10054122656583786
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12124820053577423
Validation loss = 0.11374923586845398
Validation loss = 0.11450161039829254
Validation loss = 0.1035018041729927
Validation loss = 0.11793451011180878
Validation loss = 0.12407402694225311
Validation loss = 0.1074482649564743
Validation loss = 0.1286529302597046
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12797272205352783
Validation loss = 0.11984224617481232
Validation loss = 0.10989169031381607
Validation loss = 0.15087570250034332
Validation loss = 0.14027242362499237
Validation loss = 0.13598647713661194
Validation loss = 0.12044171988964081
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1401311159133911
Validation loss = 0.1324736773967743
Validation loss = 0.1690186858177185
Validation loss = 0.14174412190914154
Validation loss = 0.14658492803573608
Validation loss = 0.15749064087867737
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.09351186454296112
Validation loss = 0.08253400772809982
Validation loss = 0.08491113781929016
Validation loss = 0.08774202316999435
Validation loss = 0.09168495237827301
Validation loss = 0.08260723203420639
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -403     |
| Iteration     | 24       |
| MaximumReturn | -120     |
| MinimumReturn | -717     |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.10072207450866699
Validation loss = 0.08048968762159348
Validation loss = 0.08167640119791031
Validation loss = 0.08834799379110336
Validation loss = 0.0927296131849289
Validation loss = 0.092253677546978
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11179179698228836
Validation loss = 0.12861813604831696
Validation loss = 0.11783226579427719
Validation loss = 0.1012280285358429
Validation loss = 0.10434167087078094
Validation loss = 0.09520070254802704
Validation loss = 0.10709837824106216
Validation loss = 0.12295302003622055
Validation loss = 0.13715870678424835
Validation loss = 0.10629356652498245
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12023495137691498
Validation loss = 0.123172327876091
Validation loss = 0.12748342752456665
Validation loss = 0.14114798605442047
Validation loss = 0.12803837656974792
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12364399433135986
Validation loss = 0.11856018751859665
Validation loss = 0.11640693247318268
Validation loss = 0.10300812125205994
Validation loss = 0.10283977538347244
Validation loss = 0.10797207057476044
Validation loss = 0.10800054669380188
Validation loss = 0.1082075908780098
Validation loss = 0.11732649058103561
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08059792220592499
Validation loss = 0.07743973284959793
Validation loss = 0.07843738049268723
Validation loss = 0.08849714696407318
Validation loss = 0.07702677696943283
Validation loss = 0.07738650590181351
Validation loss = 0.08099356293678284
Validation loss = 0.07605646550655365
Validation loss = 0.07536023110151291
Validation loss = 0.08218050003051758
Validation loss = 0.08269886672496796
Validation loss = 0.07656916975975037
Validation loss = 0.08177225291728973
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.53e+03 |
| Iteration     | 25        |
| MaximumReturn | 463       |
| MinimumReturn | -2.38e+03 |
| TotalSamples  | 108000    |
-----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09942026436328888
Validation loss = 0.0845215916633606
Validation loss = 0.0845685601234436
Validation loss = 0.08496365696191788
Validation loss = 0.08282848447561264
Validation loss = 0.08347959071397781
Validation loss = 0.08765621483325958
Validation loss = 0.08968828618526459
Validation loss = 0.11404584348201752
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09599413722753525
Validation loss = 0.12503741681575775
Validation loss = 0.1126871407032013
Validation loss = 0.0929223820567131
Validation loss = 0.1455615758895874
Validation loss = 0.10068107396364212
Validation loss = 0.10317684710025787
Validation loss = 0.13077805936336517
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12277141958475113
Validation loss = 0.11290621012449265
Validation loss = 0.1314782053232193
Validation loss = 0.11604966223239899
Validation loss = 0.09233420342206955
Validation loss = 0.09255269169807434
Validation loss = 0.11436935514211655
Validation loss = 0.10706659406423569
Validation loss = 0.08857805281877518
Validation loss = 0.09044352918863297
Validation loss = 0.09866292029619217
Validation loss = 0.10042206197977066
Validation loss = 0.11808176338672638
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0994202271103859
Validation loss = 0.10851670056581497
Validation loss = 0.10912274569272995
Validation loss = 0.15583007037639618
Validation loss = 0.10827800631523132
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07795862853527069
Validation loss = 0.07784056663513184
Validation loss = 0.0791621059179306
Validation loss = 0.07851482927799225
Validation loss = 0.06831958144903183
Validation loss = 0.07155229151248932
Validation loss = 0.07160361856222153
Validation loss = 0.08394894748926163
Validation loss = 0.07553621381521225
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -853      |
| Iteration     | 26        |
| MaximumReturn | -45.3     |
| MinimumReturn | -2.27e+03 |
| TotalSamples  | 112000    |
-----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08079947531223297
Validation loss = 0.09194225817918777
Validation loss = 0.09601583331823349
Validation loss = 0.09176486730575562
Validation loss = 0.0851493701338768
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10312467068433762
Validation loss = 0.11491851508617401
Validation loss = 0.1234402060508728
Validation loss = 0.13721629977226257
Validation loss = 0.15094879269599915
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09815017879009247
Validation loss = 0.102053202688694
Validation loss = 0.12794482707977295
Validation loss = 0.11452709883451462
Validation loss = 0.09292870759963989
Validation loss = 0.09614229947328568
Validation loss = 0.1440574675798416
Validation loss = 0.0843873843550682
Validation loss = 0.09894119203090668
Validation loss = 0.0920388326048851
Validation loss = 0.09666857868432999
Validation loss = 0.08546697348356247
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08987701684236526
Validation loss = 0.15630999207496643
Validation loss = 0.10316133499145508
Validation loss = 0.16389887034893036
Validation loss = 0.12193613499403
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06948889791965485
Validation loss = 0.07153300195932388
Validation loss = 0.0957169383764267
Validation loss = 0.07449901849031448
Validation loss = 0.07324295490980148
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.55e+03 |
| Iteration     | 27        |
| MaximumReturn | -761      |
| MinimumReturn | -2.58e+03 |
| TotalSamples  | 116000    |
-----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.11066965758800507
Validation loss = 0.09065086394548416
Validation loss = 0.07617738097906113
Validation loss = 0.08741860091686249
Validation loss = 0.09743015468120575
Validation loss = 0.09913977980613708
Validation loss = 0.11648041754961014
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18175917863845825
Validation loss = 0.14408889412879944
Validation loss = 0.1103903204202652
Validation loss = 0.1484653353691101
Validation loss = 0.13549193739891052
Validation loss = 0.13028858602046967
Validation loss = 0.18873637914657593
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10036115348339081
Validation loss = 0.09887204319238663
Validation loss = 0.10363616049289703
Validation loss = 0.08578607439994812
Validation loss = 0.09037981927394867
Validation loss = 0.12573347985744476
Validation loss = 0.11038383841514587
Validation loss = 0.10001619905233383
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08562134951353073
Validation loss = 0.09390085190534592
Validation loss = 0.09577686339616776
Validation loss = 0.08583123981952667
Validation loss = 0.0903724730014801
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07692276686429977
Validation loss = 0.0667535811662674
Validation loss = 0.0652993693947792
Validation loss = 0.07558520138263702
Validation loss = 0.06447968631982803
Validation loss = 0.06770090758800507
Validation loss = 0.07061560451984406
Validation loss = 0.06191486492753029
Validation loss = 0.06761978566646576
Validation loss = 0.06994357705116272
Validation loss = 0.06487783789634705
Validation loss = 0.06360514461994171
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -592      |
| Iteration     | 28        |
| MaximumReturn | -83.8     |
| MinimumReturn | -1.07e+03 |
| TotalSamples  | 120000    |
-----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09448894113302231
Validation loss = 0.08783353120088577
Validation loss = 0.08308219909667969
Validation loss = 0.09274493902921677
Validation loss = 0.07410947978496552
Validation loss = 0.08358526974916458
Validation loss = 0.07994534820318222
Validation loss = 0.11040844023227692
Validation loss = 0.08195831626653671
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10251195728778839
Validation loss = 0.11779194325208664
Validation loss = 0.14790982007980347
Validation loss = 0.15275435149669647
Validation loss = 0.1769077181816101
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09736695885658264
Validation loss = 0.09155885875225067
Validation loss = 0.09187132865190506
Validation loss = 0.11737324297428131
Validation loss = 0.11306121945381165
Validation loss = 0.09467123448848724
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.11722347885370255
Validation loss = 0.09872052073478699
Validation loss = 0.1069299727678299
Validation loss = 0.09052348136901855
Validation loss = 0.07599007338285446
Validation loss = 0.09162376075983047
Validation loss = 0.09565503150224686
Validation loss = 0.08972600102424622
Validation loss = 0.11967878043651581
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07188299298286438
Validation loss = 0.06331709772348404
Validation loss = 0.0614030621945858
Validation loss = 0.05941154062747955
Validation loss = 0.06258837133646011
Validation loss = 0.06640610098838806
Validation loss = 0.06058163940906525
Validation loss = 0.06095537170767784
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -277     |
| Iteration     | 29       |
| MaximumReturn | 458      |
| MinimumReturn | -834     |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08637495338916779
Validation loss = 0.08493853360414505
Validation loss = 0.09324721246957779
Validation loss = 0.0964173823595047
Validation loss = 0.09180353581905365
Validation loss = 0.12585127353668213
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10901938378810883
Validation loss = 0.11694297939538956
Validation loss = 0.14485622942447662
Validation loss = 0.13438180088996887
Validation loss = 0.15487729012966156
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13555069267749786
Validation loss = 0.11773916333913803
Validation loss = 0.09889670461416245
Validation loss = 0.0961538627743721
Validation loss = 0.08714674413204193
Validation loss = 0.12385925650596619
Validation loss = 0.1372966319322586
Validation loss = 0.0981769934296608
Validation loss = 0.09375648945569992
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10867814719676971
Validation loss = 0.0912264883518219
Validation loss = 0.08158037066459656
Validation loss = 0.08790428191423416
Validation loss = 0.09948573261499405
Validation loss = 0.09218663722276688
Validation loss = 0.10103658586740494
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06955240666866302
Validation loss = 0.057955481112003326
Validation loss = 0.06096525117754936
Validation loss = 0.06727125495672226
Validation loss = 0.05952147766947746
Validation loss = 0.06361866742372513
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 34.9     |
| Iteration     | 30       |
| MaximumReturn | 591      |
| MinimumReturn | -368     |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0830661803483963
Validation loss = 0.08522842079401016
Validation loss = 0.07815451920032501
Validation loss = 0.07871215045452118
Validation loss = 0.07384580373764038
Validation loss = 0.09605295956134796
Validation loss = 0.12157365679740906
Validation loss = 0.08889941871166229
Validation loss = 0.08321180194616318
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15298029780387878
Validation loss = 0.15199552476406097
Validation loss = 0.14237898588180542
Validation loss = 0.11998718976974487
Validation loss = 0.2055424600839615
Validation loss = 0.17755191028118134
Validation loss = 0.15259391069412231
Validation loss = 0.1502537578344345
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1219683513045311
Validation loss = 0.09163469076156616
Validation loss = 0.11358717828989029
Validation loss = 0.10824060440063477
Validation loss = 0.08950457721948624
Validation loss = 0.09619385004043579
Validation loss = 0.10829350352287292
Validation loss = 0.0932038277387619
Validation loss = 0.07549580931663513
Validation loss = 0.10714320093393326
Validation loss = 0.09581633657217026
Validation loss = 0.07177460938692093
Validation loss = 0.08276452124118805
Validation loss = 0.09239959716796875
Validation loss = 0.08361759781837463
Validation loss = 0.07214760780334473
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10171297192573547
Validation loss = 0.07852429151535034
Validation loss = 0.09492121636867523
Validation loss = 0.09746687114238739
Validation loss = 0.08962584286928177
Validation loss = 0.0942525565624237
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06245355308055878
Validation loss = 0.05645843222737312
Validation loss = 0.05694316327571869
Validation loss = 0.060750316828489304
Validation loss = 0.05924788862466812
Validation loss = 0.059992507100105286
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -411     |
| Iteration     | 31       |
| MaximumReturn | 350      |
| MinimumReturn | -976     |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08414651453495026
Validation loss = 0.07478882372379303
Validation loss = 0.07230842113494873
Validation loss = 0.09397770464420319
Validation loss = 0.08015214651823044
Validation loss = 0.14391224086284637
Validation loss = 0.10363058745861053
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.18040697276592255
Validation loss = 0.12749803066253662
Validation loss = 0.11180109530687332
Validation loss = 0.14488515257835388
Validation loss = 0.11062508821487427
Validation loss = 0.12992554903030396
Validation loss = 0.18043065071105957
Validation loss = 0.18824084103107452
Validation loss = 0.21204248070716858
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11372552067041397
Validation loss = 0.09201038628816605
Validation loss = 0.09127261489629745
Validation loss = 0.07382940500974655
Validation loss = 0.07489871978759766
Validation loss = 0.13900254666805267
Validation loss = 0.09512566030025482
Validation loss = 0.10611382126808167
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.08748600631952286
Validation loss = 0.08821290731430054
Validation loss = 0.11020118743181229
Validation loss = 0.09359300136566162
Validation loss = 0.08657028526067734
Validation loss = 0.08227483928203583
Validation loss = 0.07535412162542343
Validation loss = 0.08612770587205887
Validation loss = 0.07527176290750504
Validation loss = 0.108404740691185
Validation loss = 0.07142437249422073
Validation loss = 0.08048693090677261
Validation loss = 0.08743583410978317
Validation loss = 0.07381083071231842
Validation loss = 0.08770033717155457
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06390420347452164
Validation loss = 0.0543181486427784
Validation loss = 0.05703939124941826
Validation loss = 0.06474269181489944
Validation loss = 0.05660119280219078
Validation loss = 0.057518456131219864
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -32       |
| Iteration     | 32        |
| MaximumReturn | 958       |
| MinimumReturn | -1.82e+03 |
| TotalSamples  | 136000    |
-----------------------------
