Logging to experiments/hopper/hopper/Sun-23-Oct-2022-10-30-55-AM-CDT_hopper_trpo_iteration_20_seed2231
Print configuration .....
{'env_name': 'hopper', 'random_seeds': [1234, 2431, 2531, 2231], 'save_variables': False, 'model_save_dir': '/tmp/hopper_models/', 'restore_variables': False, 'start_onpol_iter': 0, 'onpol_iters': 33, 'num_path_random': 6, 'num_path_onpol': 6, 'env_horizon': 1000, 'max_train_data': 200000, 'max_val_data': 100000, 'discard_ratio': 0.0, 'dynamics': {'pre_training': {'mode': 'intrinsic_reward', 'itr': 0, 'policy_itr': 20}, 'model': 'nn', 'ensemble': True, 'ensemble_model_count': 5, 'enable_particle_ensemble': True, 'particles': 5, 'obs_var': 1.0, 'intrinsic_reward_coeff': 1.0, 'ita': 1.0, 'mode': 'random', 'val': True, 'n_layers': 4, 'hidden_size': 1000, 'activation': 'relu', 'batch_size': 1000, 'learning_rate': 0.001, 'reg_coeff': 0.0, 'epochs': 200, 'kfac_params': {'learning_rate': 0.1, 'damping': 0.001, 'momentum': 0.9, 'kl_clip': 0.0001, 'cov_ema_decay': 0.99}}, 'policy': {'network_shape': [64, 64], 'init_logstd': 0.0, 'activation': 'tanh', 'reinitialize_every_itr': False}, 'trpo': {'horizon': 1000, 'gamma': 0.99, 'step_size': 0.01, 'iterations': 20, 'batch_size': 50000, 'gae': 0.95, 'visualization': False, 'visualize_iterations': [0]}, 'algo': 'trpo'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.655036449432373
Validation loss = 0.24939580261707306
Validation loss = 0.2374618649482727
Validation loss = 0.23066651821136475
Validation loss = 0.2296181321144104
Validation loss = 0.23286010324954987
Validation loss = 0.23727953433990479
Validation loss = 0.23202121257781982
Validation loss = 0.24819664657115936
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.5073890089988708
Validation loss = 0.24700182676315308
Validation loss = 0.23012036085128784
Validation loss = 0.22960157692432404
Validation loss = 0.21270477771759033
Validation loss = 0.22503475844860077
Validation loss = 0.2425639033317566
Validation loss = 0.22358620166778564
Validation loss = 0.24310746788978577
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.47748368978500366
Validation loss = 0.24335169792175293
Validation loss = 0.23776623606681824
Validation loss = 0.22372815012931824
Validation loss = 0.23431171476840973
Validation loss = 0.21766361594200134
Validation loss = 0.2284032255411148
Validation loss = 0.24907028675079346
Validation loss = 0.24368318915367126
Validation loss = 0.2629683315753937
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.48030659556388855
Validation loss = 0.25289419293403625
Validation loss = 0.23546963930130005
Validation loss = 0.22281214594841003
Validation loss = 0.21570271253585815
Validation loss = 0.2182711660861969
Validation loss = 0.22290131449699402
Validation loss = 0.22605273127555847
Validation loss = 0.24322080612182617
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.6474594473838806
Validation loss = 0.25641027092933655
Validation loss = 0.2341185212135315
Validation loss = 0.22737860679626465
Validation loss = 0.22274480760097504
Validation loss = 0.21789026260375977
Validation loss = 0.2211083024740219
Validation loss = 0.23952478170394897
Validation loss = 0.22980183362960815
Validation loss = 0.245868518948555
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.99e+03 |
| Iteration     | 0         |
| MaximumReturn | -1.68e+03 |
| MinimumReturn | -2.3e+03  |
| TotalSamples  | 8000      |
-----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.27498263120651245
Validation loss = 0.23156996071338654
Validation loss = 0.2347421497106552
Validation loss = 0.2375100702047348
Validation loss = 0.23200896382331848
Validation loss = 0.24391794204711914
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2648361623287201
Validation loss = 0.23525160551071167
Validation loss = 0.21923133730888367
Validation loss = 0.22201260924339294
Validation loss = 0.22149018943309784
Validation loss = 0.23922792077064514
Validation loss = 0.2433527112007141
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2744912803173065
Validation loss = 0.2325362265110016
Validation loss = 0.23379045724868774
Validation loss = 0.23987102508544922
Validation loss = 0.2471502274274826
Validation loss = 0.2538129389286041
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.26351210474967957
Validation loss = 0.22491487860679626
Validation loss = 0.22290842235088348
Validation loss = 0.22212937474250793
Validation loss = 0.23531559109687805
Validation loss = 0.22479544579982758
Validation loss = 0.2350868135690689
Validation loss = 0.23158493638038635
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.2758884131908417
Validation loss = 0.22972723841667175
Validation loss = 0.22953695058822632
Validation loss = 0.24348074197769165
Validation loss = 0.2314787358045578
Validation loss = 0.23254846036434174
Validation loss = 0.2528966963291168
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.1e+03  |
| Iteration     | 1         |
| MaximumReturn | -560      |
| MinimumReturn | -1.53e+03 |
| TotalSamples  | 12000     |
-----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.28556689620018005
Validation loss = 0.244755819439888
Validation loss = 0.22352714836597443
Validation loss = 0.22697675228118896
Validation loss = 0.22258883714675903
Validation loss = 0.24213182926177979
Validation loss = 0.23208020627498627
Validation loss = 0.24512511491775513
Validation loss = 0.22831599414348602
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.29736030101776123
Validation loss = 0.2303374856710434
Validation loss = 0.21853189170360565
Validation loss = 0.2161768674850464
Validation loss = 0.21234755218029022
Validation loss = 0.21227823197841644
Validation loss = 0.21820907294750214
Validation loss = 0.22647227346897125
Validation loss = 0.21925391256809235
Validation loss = 0.21492046117782593
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2933259904384613
Validation loss = 0.23173560202121735
Validation loss = 0.21874773502349854
Validation loss = 0.22777831554412842
Validation loss = 0.22386051714420319
Validation loss = 0.22611747682094574
Validation loss = 0.2276492714881897
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.2982977330684662
Validation loss = 0.22533585131168365
Validation loss = 0.21710270643234253
Validation loss = 0.21800144016742706
Validation loss = 0.22063572704792023
Validation loss = 0.2185249775648117
Validation loss = 0.22357989847660065
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.29668501019477844
Validation loss = 0.24047382175922394
Validation loss = 0.22424067556858063
Validation loss = 0.21865582466125488
Validation loss = 0.2336006611585617
Validation loss = 0.22645503282546997
Validation loss = 0.21730929613113403
Validation loss = 0.21938736736774445
Validation loss = 0.22446918487548828
Validation loss = 0.22850863635540009
Validation loss = 0.22024887800216675
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.05e+03 |
| Iteration     | 2         |
| MaximumReturn | 194       |
| MinimumReturn | -1.94e+03 |
| TotalSamples  | 16000     |
-----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.23320770263671875
Validation loss = 0.20865918695926666
Validation loss = 0.19065406918525696
Validation loss = 0.17755155265331268
Validation loss = 0.1763356775045395
Validation loss = 0.188174769282341
Validation loss = 0.18049387633800507
Validation loss = 0.17649613320827484
Validation loss = 0.18213027715682983
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.2350527048110962
Validation loss = 0.1930948793888092
Validation loss = 0.1896333545446396
Validation loss = 0.19132016599178314
Validation loss = 0.17717847228050232
Validation loss = 0.17784783244132996
Validation loss = 0.1752813756465912
Validation loss = 0.17869433760643005
Validation loss = 0.1784045547246933
Validation loss = 0.17772012948989868
Validation loss = 0.1747690588235855
Validation loss = 0.1717831939458847
Validation loss = 0.17418375611305237
Validation loss = 0.17724986374378204
Validation loss = 0.1842777580022812
Validation loss = 0.17907747626304626
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.2360704243183136
Validation loss = 0.1956537961959839
Validation loss = 0.18307292461395264
Validation loss = 0.18046613037586212
Validation loss = 0.19019180536270142
Validation loss = 0.18046045303344727
Validation loss = 0.18349428474903107
Validation loss = 0.17794014513492584
Validation loss = 0.17716377973556519
Validation loss = 0.18362711369991302
Validation loss = 0.1767488718032837
Validation loss = 0.17507192492485046
Validation loss = 0.17680487036705017
Validation loss = 0.17151087522506714
Validation loss = 0.1729358434677124
Validation loss = 0.17405256628990173
Validation loss = 0.1773032546043396
Validation loss = 0.16901405155658722
Validation loss = 0.17046158015727997
Validation loss = 0.1738196611404419
Validation loss = 0.1722351312637329
Validation loss = 0.1722511351108551
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.22919031977653503
Validation loss = 0.19645792245864868
Validation loss = 0.18932320177555084
Validation loss = 0.18048493564128876
Validation loss = 0.18342962861061096
Validation loss = 0.178554967045784
Validation loss = 0.1770642250776291
Validation loss = 0.178849458694458
Validation loss = 0.1805446296930313
Validation loss = 0.17710313200950623
Validation loss = 0.1780499815940857
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.22110304236412048
Validation loss = 0.19382375478744507
Validation loss = 0.18088103830814362
Validation loss = 0.18594631552696228
Validation loss = 0.18372894823551178
Validation loss = 0.1798284500837326
Validation loss = 0.1755536049604416
Validation loss = 0.17597316205501556
Validation loss = 0.1804400235414505
Validation loss = 0.17439818382263184
Validation loss = 0.17736253142356873
Validation loss = 0.17970389127731323
Validation loss = 0.1748657524585724
Validation loss = 0.17326614260673523
Validation loss = 0.17421205341815948
Validation loss = 0.17194879055023193
Validation loss = 0.17289593815803528
Validation loss = 0.17164346575737
Validation loss = 0.17496654391288757
Validation loss = 0.1886599361896515
Validation loss = 0.18071085214614868
Validation loss = 0.1705152690410614
Validation loss = 0.1684681475162506
Validation loss = 0.17056263983249664
Validation loss = 0.1759452223777771
Validation loss = 0.17374956607818604
Validation loss = 0.16984184086322784
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -519      |
| Iteration     | 3         |
| MaximumReturn | 155       |
| MinimumReturn | -1.38e+03 |
| TotalSamples  | 20000     |
-----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.18949702382087708
Validation loss = 0.1725432425737381
Validation loss = 0.16791465878486633
Validation loss = 0.1611795425415039
Validation loss = 0.1588020622730255
Validation loss = 0.15698520839214325
Validation loss = 0.1564803123474121
Validation loss = 0.1563618928194046
Validation loss = 0.16049067676067352
Validation loss = 0.15384109318256378
Validation loss = 0.15876781940460205
Validation loss = 0.1643969714641571
Validation loss = 0.15430662035942078
Validation loss = 0.15236172080039978
Validation loss = 0.15305688977241516
Validation loss = 0.15300820767879486
Validation loss = 0.15082378685474396
Validation loss = 0.15261057019233704
Validation loss = 0.1498074233531952
Validation loss = 0.15240655839443207
Validation loss = 0.1572449803352356
Validation loss = 0.15410390496253967
Validation loss = 0.15160879492759705
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.20303264260292053
Validation loss = 0.16761580109596252
Validation loss = 0.15975812077522278
Validation loss = 0.1528908908367157
Validation loss = 0.16049981117248535
Validation loss = 0.15298305451869965
Validation loss = 0.14724093675613403
Validation loss = 0.148772194981575
Validation loss = 0.1552039384841919
Validation loss = 0.15083345770835876
Validation loss = 0.15470221638679504
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.20641691982746124
Validation loss = 0.16841307282447815
Validation loss = 0.16459481418132782
Validation loss = 0.15745104849338531
Validation loss = 0.1563708782196045
Validation loss = 0.15438416600227356
Validation loss = 0.15186187624931335
Validation loss = 0.1548605114221573
Validation loss = 0.1490563601255417
Validation loss = 0.15180906653404236
Validation loss = 0.15770603716373444
Validation loss = 0.1619199961423874
Validation loss = 0.1500428169965744
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1922357976436615
Validation loss = 0.17550869286060333
Validation loss = 0.1591857373714447
Validation loss = 0.15792445838451385
Validation loss = 0.15165546536445618
Validation loss = 0.15254732966423035
Validation loss = 0.15535348653793335
Validation loss = 0.15534254908561707
Validation loss = 0.15419861674308777
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.19040897488594055
Validation loss = 0.16589970886707306
Validation loss = 0.16314642131328583
Validation loss = 0.1562623530626297
Validation loss = 0.1514524221420288
Validation loss = 0.15358766913414001
Validation loss = 0.15376299619674683
Validation loss = 0.15109610557556152
Validation loss = 0.150241419672966
Validation loss = 0.1519765853881836
Validation loss = 0.1481860727071762
Validation loss = 0.1507083773612976
Validation loss = 0.15000241994857788
Validation loss = 0.15359720587730408
Validation loss = 0.1492517739534378
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -315     |
| Iteration     | 4        |
| MaximumReturn | 702      |
| MinimumReturn | -925     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.16644488275051117
Validation loss = 0.14649124443531036
Validation loss = 0.14041878283023834
Validation loss = 0.14141198992729187
Validation loss = 0.14044733345508575
Validation loss = 0.1369512379169464
Validation loss = 0.135430708527565
Validation loss = 0.13618777692317963
Validation loss = 0.13938899338245392
Validation loss = 0.14831416308879852
Validation loss = 0.14434032142162323
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.17741036415100098
Validation loss = 0.1498934030532837
Validation loss = 0.14398570358753204
Validation loss = 0.1334967315196991
Validation loss = 0.13596810400485992
Validation loss = 0.13335178792476654
Validation loss = 0.13420404493808746
Validation loss = 0.13172049820423126
Validation loss = 0.13326309621334076
Validation loss = 0.14142382144927979
Validation loss = 0.13661004602909088
Validation loss = 0.1342315822839737
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.15731240808963776
Validation loss = 0.1452377289533615
Validation loss = 0.14001722633838654
Validation loss = 0.1378074735403061
Validation loss = 0.13499154150485992
Validation loss = 0.1375368982553482
Validation loss = 0.13585501909255981
Validation loss = 0.14508579671382904
Validation loss = 0.135673388838768
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.18243885040283203
Validation loss = 0.14818075299263
Validation loss = 0.14356617629528046
Validation loss = 0.13986879587173462
Validation loss = 0.14670105278491974
Validation loss = 0.13998587429523468
Validation loss = 0.13971775770187378
Validation loss = 0.13886477053165436
Validation loss = 0.14157713949680328
Validation loss = 0.1430032104253769
Validation loss = 0.14223499596118927
Validation loss = 0.14943288266658783
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.17604871094226837
Validation loss = 0.15420497953891754
Validation loss = 0.1364113986492157
Validation loss = 0.13808006048202515
Validation loss = 0.13425882160663605
Validation loss = 0.13474982976913452
Validation loss = 0.1405157893896103
Validation loss = 0.13719867169857025
Validation loss = 0.1349666267633438
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -347     |
| Iteration     | 5        |
| MaximumReturn | 94.4     |
| MinimumReturn | -788     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.15043629705905914
Validation loss = 0.12744584679603577
Validation loss = 0.12725362181663513
Validation loss = 0.12419803440570831
Validation loss = 0.12254423648118973
Validation loss = 0.12005800753831863
Validation loss = 0.12320643663406372
Validation loss = 0.12624110281467438
Validation loss = 0.1351303607225418
Validation loss = 0.12307226657867432
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.14043277502059937
Validation loss = 0.12948790192604065
Validation loss = 0.12161336839199066
Validation loss = 0.12196028232574463
Validation loss = 0.12296164035797119
Validation loss = 0.1192822977900505
Validation loss = 0.12131644785404205
Validation loss = 0.12287572771310806
Validation loss = 0.13100869953632355
Validation loss = 0.12117886543273926
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.14622971415519714
Validation loss = 0.13010136783123016
Validation loss = 0.12659534811973572
Validation loss = 0.12356690317392349
Validation loss = 0.12231399863958359
Validation loss = 0.12346228212118149
Validation loss = 0.12934815883636475
Validation loss = 0.12336175888776779
Validation loss = 0.1268470287322998
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.14860548079013824
Validation loss = 0.13256435096263885
Validation loss = 0.12854911386966705
Validation loss = 0.12556102871894836
Validation loss = 0.1283893585205078
Validation loss = 0.12335214763879776
Validation loss = 0.12216495722532272
Validation loss = 0.12297412008047104
Validation loss = 0.12840095162391663
Validation loss = 0.12789086997509003
Validation loss = 0.12273818999528885
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.14529912173748016
Validation loss = 0.1338515430688858
Validation loss = 0.12855133414268494
Validation loss = 0.12524214386940002
Validation loss = 0.12425116449594498
Validation loss = 0.12019983679056168
Validation loss = 0.12139381468296051
Validation loss = 0.12275081872940063
Validation loss = 0.1258157640695572
Validation loss = 0.13693515956401825
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -419      |
| Iteration     | 6         |
| MaximumReturn | 303       |
| MinimumReturn | -1.16e+03 |
| TotalSamples  | 32000     |
-----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.13845930993556976
Validation loss = 0.12494828552007675
Validation loss = 0.11483865976333618
Validation loss = 0.11197743564844131
Validation loss = 0.11476868391036987
Validation loss = 0.12189963459968567
Validation loss = 0.11674781143665314
Validation loss = 0.11204186826944351
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1310151070356369
Validation loss = 0.12104921787977219
Validation loss = 0.11450783908367157
Validation loss = 0.11158156394958496
Validation loss = 0.11110830307006836
Validation loss = 0.11201587319374084
Validation loss = 0.1171569675207138
Validation loss = 0.11701495945453644
Validation loss = 0.12072031199932098
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.1348504275083542
Validation loss = 0.12198276817798615
Validation loss = 0.11432388424873352
Validation loss = 0.11718485504388809
Validation loss = 0.11470799893140793
Validation loss = 0.11476919054985046
Validation loss = 0.11417067050933838
Validation loss = 0.12102069705724716
Validation loss = 0.1254379153251648
Validation loss = 0.10940875113010406
Validation loss = 0.10975392907857895
Validation loss = 0.11016049981117249
Validation loss = 0.10996633768081665
Validation loss = 0.11386378109455109
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12879982590675354
Validation loss = 0.1219530776143074
Validation loss = 0.11676016449928284
Validation loss = 0.1158236712217331
Validation loss = 0.11465375125408173
Validation loss = 0.11370642483234406
Validation loss = 0.11589931696653366
Validation loss = 0.1161995530128479
Validation loss = 0.12501242756843567
Validation loss = 0.11632994562387466
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.13592998683452606
Validation loss = 0.11806733161211014
Validation loss = 0.11894109100103378
Validation loss = 0.11411373317241669
Validation loss = 0.11602261662483215
Validation loss = 0.11126445233821869
Validation loss = 0.11315631866455078
Validation loss = 0.119326151907444
Validation loss = 0.11746855080127716
Validation loss = 0.1163046658039093
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -1.86e+03 |
| Iteration     | 7         |
| MaximumReturn | -1.06e+03 |
| MinimumReturn | -2.66e+03 |
| TotalSamples  | 36000     |
-----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.12716394662857056
Validation loss = 0.11322370171546936
Validation loss = 0.10651135444641113
Validation loss = 0.10380420088768005
Validation loss = 0.10884647816419601
Validation loss = 0.10747286677360535
Validation loss = 0.10929957777261734
Validation loss = 0.09995537996292114
Validation loss = 0.09934057295322418
Validation loss = 0.09957940131425858
Validation loss = 0.11034056544303894
Validation loss = 0.11254363507032394
Validation loss = 0.10072559118270874
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12778347730636597
Validation loss = 0.11011989414691925
Validation loss = 0.10400303453207016
Validation loss = 0.10303918272256851
Validation loss = 0.10061517357826233
Validation loss = 0.114109568297863
Validation loss = 0.10578678548336029
Validation loss = 0.09919029474258423
Validation loss = 0.10029762238264084
Validation loss = 0.10244831442832947
Validation loss = 0.11265677958726883
Validation loss = 0.10016441345214844
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13607056438922882
Validation loss = 0.11613760888576508
Validation loss = 0.1031217873096466
Validation loss = 0.10042863339185715
Validation loss = 0.10018306970596313
Validation loss = 0.10013271868228912
Validation loss = 0.10334808379411697
Validation loss = 0.10719206929206848
Validation loss = 0.1064167246222496
Validation loss = 0.10230794548988342
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1201932281255722
Validation loss = 0.1110198125243187
Validation loss = 0.10654289275407791
Validation loss = 0.10308337211608887
Validation loss = 0.1035246029496193
Validation loss = 0.12600868940353394
Validation loss = 0.10567031055688858
Validation loss = 0.10106313973665237
Validation loss = 0.09997722506523132
Validation loss = 0.10118228942155838
Validation loss = 0.10134950280189514
Validation loss = 0.11225199699401855
Validation loss = 0.10307138413190842
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12798455357551575
Validation loss = 0.1117723360657692
Validation loss = 0.10338570922613144
Validation loss = 0.10342822968959808
Validation loss = 0.11118459701538086
Validation loss = 0.11968003213405609
Validation loss = 0.10333847254514694
Validation loss = 0.10168453305959702
Validation loss = 0.10061483085155487
Validation loss = 0.1013428270816803
Validation loss = 0.10412014275789261
Validation loss = 0.10096833854913712
Validation loss = 0.110030896961689
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -700      |
| Iteration     | 8         |
| MaximumReturn | -21.4     |
| MinimumReturn | -1.22e+03 |
| TotalSamples  | 40000     |
-----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1142675131559372
Validation loss = 0.09943295270204544
Validation loss = 0.09554430842399597
Validation loss = 0.09517388045787811
Validation loss = 0.09392721951007843
Validation loss = 0.0992390513420105
Validation loss = 0.10194214433431625
Validation loss = 0.09324862062931061
Validation loss = 0.09716680645942688
Validation loss = 0.09202305972576141
Validation loss = 0.09689120203256607
Validation loss = 0.09596012532711029
Validation loss = 0.10506373643875122
Validation loss = 0.09148011356592178
Validation loss = 0.08889684826135635
Validation loss = 0.08923384547233582
Validation loss = 0.0894661471247673
Validation loss = 0.1016639694571495
Validation loss = 0.09691877663135529
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.11079086363315582
Validation loss = 0.1027863398194313
Validation loss = 0.09415686130523682
Validation loss = 0.09231969714164734
Validation loss = 0.09362570196390152
Validation loss = 0.09214214980602264
Validation loss = 0.09576671570539474
Validation loss = 0.10320492088794708
Validation loss = 0.09202749282121658
Validation loss = 0.0899582952260971
Validation loss = 0.09057491272687912
Validation loss = 0.09152580052614212
Validation loss = 0.10511044412851334
Validation loss = 0.1007225513458252
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.11144556105136871
Validation loss = 0.0964893251657486
Validation loss = 0.09529658406972885
Validation loss = 0.09597945213317871
Validation loss = 0.09673266112804413
Validation loss = 0.0954122543334961
Validation loss = 0.09352917969226837
Validation loss = 0.09174863994121552
Validation loss = 0.09555571526288986
Validation loss = 0.09133418649435043
Validation loss = 0.08968742191791534
Validation loss = 0.10777902603149414
Validation loss = 0.09203154593706131
Validation loss = 0.08763200789690018
Validation loss = 0.08898598700761795
Validation loss = 0.0906420424580574
Validation loss = 0.09477551281452179
Validation loss = 0.0911809578537941
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10642091184854507
Validation loss = 0.10050299018621445
Validation loss = 0.09600989520549774
Validation loss = 0.09235028922557831
Validation loss = 0.09646151214838028
Validation loss = 0.09823054820299149
Validation loss = 0.0944138914346695
Validation loss = 0.09064215421676636
Validation loss = 0.08992749452590942
Validation loss = 0.0942419245839119
Validation loss = 0.09884396195411682
Validation loss = 0.09104388952255249
Validation loss = 0.09128443151712418
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.117581807076931
Validation loss = 0.10724566131830215
Validation loss = 0.09439438581466675
Validation loss = 0.09318892657756805
Validation loss = 0.09473969042301178
Validation loss = 0.09178154170513153
Validation loss = 0.09848976880311966
Validation loss = 0.09793675690889359
Validation loss = 0.09191452711820602
Validation loss = 0.09086566418409348
Validation loss = 0.09297803789377213
Validation loss = 0.0968412458896637
Validation loss = 0.09494862705469131
Validation loss = 0.09199581295251846
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -89.4    |
| Iteration     | 9        |
| MaximumReturn | 858      |
| MinimumReturn | -758     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09811469912528992
Validation loss = 0.09158780425786972
Validation loss = 0.08632063120603561
Validation loss = 0.08451834321022034
Validation loss = 0.08665290474891663
Validation loss = 0.08628063648939133
Validation loss = 0.09005729109048843
Validation loss = 0.09126785397529602
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09850713610649109
Validation loss = 0.09040138125419617
Validation loss = 0.08593405038118362
Validation loss = 0.09014208614826202
Validation loss = 0.09112375229597092
Validation loss = 0.08764979243278503
Validation loss = 0.0878034308552742
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.10487211495637894
Validation loss = 0.09796153753995895
Validation loss = 0.08921054750680923
Validation loss = 0.0850721001625061
Validation loss = 0.08460429310798645
Validation loss = 0.08617157489061356
Validation loss = 0.09206462651491165
Validation loss = 0.09046220034360886
Validation loss = 0.08819807320833206
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09861666709184647
Validation loss = 0.0964781790971756
Validation loss = 0.08814568817615509
Validation loss = 0.08484280109405518
Validation loss = 0.08761531859636307
Validation loss = 0.09119708091020584
Validation loss = 0.09535732865333557
Validation loss = 0.09309899061918259
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10100636631250381
Validation loss = 0.09217426925897598
Validation loss = 0.08705617487430573
Validation loss = 0.08515126258134842
Validation loss = 0.08403061330318451
Validation loss = 0.0876477062702179
Validation loss = 0.09840758144855499
Validation loss = 0.08692213892936707
Validation loss = 0.08274403214454651
Validation loss = 0.08317342400550842
Validation loss = 0.08303523063659668
Validation loss = 0.09941191971302032
Validation loss = 0.08436203747987747
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -338     |
| Iteration     | 10       |
| MaximumReturn | 47.5     |
| MinimumReturn | -687     |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09458646923303604
Validation loss = 0.08617227524518967
Validation loss = 0.08154543489217758
Validation loss = 0.07744920998811722
Validation loss = 0.0751831904053688
Validation loss = 0.07936544716358185
Validation loss = 0.07738389819860458
Validation loss = 0.09076476097106934
Validation loss = 0.07621251791715622
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.09686776995658875
Validation loss = 0.08520758152008057
Validation loss = 0.07874133437871933
Validation loss = 0.07967666536569595
Validation loss = 0.0809815526008606
Validation loss = 0.07725030183792114
Validation loss = 0.07676433771848679
Validation loss = 0.08547374606132507
Validation loss = 0.0808512419462204
Validation loss = 0.07550174742937088
Validation loss = 0.0742756798863411
Validation loss = 0.0746513307094574
Validation loss = 0.0880279541015625
Validation loss = 0.07971305400133133
Validation loss = 0.07425352931022644
Validation loss = 0.07296138256788254
Validation loss = 0.07444816827774048
Validation loss = 0.07867991924285889
Validation loss = 0.07842757552862167
Validation loss = 0.07249505072832108
Validation loss = 0.07251188158988953
Validation loss = 0.07268176227807999
Validation loss = 0.08818545937538147
Validation loss = 0.08498118072748184
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.0972091481089592
Validation loss = 0.08048886805772781
Validation loss = 0.07790391892194748
Validation loss = 0.07629857212305069
Validation loss = 0.07749638706445694
Validation loss = 0.0837855115532875
Validation loss = 0.08037423342466354
Validation loss = 0.07591646164655685
Validation loss = 0.07538317143917084
Validation loss = 0.07718256860971451
Validation loss = 0.09197694063186646
Validation loss = 0.07768189162015915
Validation loss = 0.07475630193948746
Validation loss = 0.07316914945840836
Validation loss = 0.07752159982919693
Validation loss = 0.0854533314704895
Validation loss = 0.07712868601083755
Validation loss = 0.07546620070934296
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10012060403823853
Validation loss = 0.08045937120914459
Validation loss = 0.0782269611954689
Validation loss = 0.0786895602941513
Validation loss = 0.07872810959815979
Validation loss = 0.07820909470319748
Validation loss = 0.07703524082899094
Validation loss = 0.08751791715621948
Validation loss = 0.07776587456464767
Validation loss = 0.07567863911390305
Validation loss = 0.07963519543409348
Validation loss = 0.0777919590473175
Validation loss = 0.08195450901985168
Validation loss = 0.07825005799531937
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08786650747060776
Validation loss = 0.08072817325592041
Validation loss = 0.08010037988424301
Validation loss = 0.07891436666250229
Validation loss = 0.07626532763242722
Validation loss = 0.07417871803045273
Validation loss = 0.0793985053896904
Validation loss = 0.08624937385320663
Validation loss = 0.08010639250278473
Validation loss = 0.07654579728841782
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -917      |
| Iteration     | 11        |
| MaximumReturn | 522       |
| MinimumReturn | -2.84e+03 |
| TotalSamples  | 52000     |
-----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0837254598736763
Validation loss = 0.07379422336816788
Validation loss = 0.07214266806840897
Validation loss = 0.07172541320323944
Validation loss = 0.07681859284639359
Validation loss = 0.08829139918088913
Validation loss = 0.0784100741147995
Validation loss = 0.0716380625963211
Validation loss = 0.07090042531490326
Validation loss = 0.0723748654127121
Validation loss = 0.07078047096729279
Validation loss = 0.07449046522378922
Validation loss = 0.0804722011089325
Validation loss = 0.0700874999165535
Validation loss = 0.06858132034540176
Validation loss = 0.07077919691801071
Validation loss = 0.07415687292814255
Validation loss = 0.08057192713022232
Validation loss = 0.07472222298383713
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0848441943526268
Validation loss = 0.07059416174888611
Validation loss = 0.06923628598451614
Validation loss = 0.06798117607831955
Validation loss = 0.06855829060077667
Validation loss = 0.06770022213459015
Validation loss = 0.07366224378347397
Validation loss = 0.066573865711689
Validation loss = 0.06615837663412094
Validation loss = 0.07235211879014969
Validation loss = 0.07025124877691269
Validation loss = 0.068116694688797
Validation loss = 0.08532925695180893
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.08372785151004791
Validation loss = 0.07536397129297256
Validation loss = 0.07215727120637894
Validation loss = 0.06939324736595154
Validation loss = 0.07133720815181732
Validation loss = 0.07365921139717102
Validation loss = 0.07434367388486862
Validation loss = 0.06847012042999268
Validation loss = 0.06933711469173431
Validation loss = 0.06991253793239594
Validation loss = 0.08366461843252182
Validation loss = 0.07405921071767807
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09108993411064148
Validation loss = 0.07933211326599121
Validation loss = 0.0703120157122612
Validation loss = 0.0709860697388649
Validation loss = 0.07203887403011322
Validation loss = 0.07466503232717514
Validation loss = 0.07828232645988464
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08766211569309235
Validation loss = 0.07912953197956085
Validation loss = 0.06986194849014282
Validation loss = 0.07036733627319336
Validation loss = 0.06942766159772873
Validation loss = 0.07787664234638214
Validation loss = 0.076944999396801
Validation loss = 0.07084661722183228
Validation loss = 0.0691823661327362
Validation loss = 0.06827307492494583
Validation loss = 0.07064685225486755
Validation loss = 0.07725454121828079
Validation loss = 0.07361123710870743
Validation loss = 0.07351035624742508
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -33.4    |
| Iteration     | 12       |
| MaximumReturn | 809      |
| MinimumReturn | -960     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07479391992092133
Validation loss = 0.07081783562898636
Validation loss = 0.06879574805498123
Validation loss = 0.06830143928527832
Validation loss = 0.07602167874574661
Validation loss = 0.06981489062309265
Validation loss = 0.06471478193998337
Validation loss = 0.06500449031591415
Validation loss = 0.06853621453046799
Validation loss = 0.07468030601739883
Validation loss = 0.0700405165553093
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07903116941452026
Validation loss = 0.07085440307855606
Validation loss = 0.0652034804224968
Validation loss = 0.06267370283603668
Validation loss = 0.06477487832307816
Validation loss = 0.06865183264017105
Validation loss = 0.06774090975522995
Validation loss = 0.06235813722014427
Validation loss = 0.06264602392911911
Validation loss = 0.06374485045671463
Validation loss = 0.06752780824899673
Validation loss = 0.06897033751010895
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07471482455730438
Validation loss = 0.06919150799512863
Validation loss = 0.06640355288982391
Validation loss = 0.0661010667681694
Validation loss = 0.06912792474031448
Validation loss = 0.07214228063821793
Validation loss = 0.07054821401834488
Validation loss = 0.0632999911904335
Validation loss = 0.06306738406419754
Validation loss = 0.06592297554016113
Validation loss = 0.07311482727527618
Validation loss = 0.068276047706604
Validation loss = 0.06399055570363998
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0770166888833046
Validation loss = 0.07181127369403839
Validation loss = 0.0666995570063591
Validation loss = 0.06932802498340607
Validation loss = 0.07066687941551208
Validation loss = 0.07994069159030914
Validation loss = 0.07028324156999588
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07560034096240997
Validation loss = 0.06729706376791
Validation loss = 0.06477722525596619
Validation loss = 0.06546895205974579
Validation loss = 0.06778781116008759
Validation loss = 0.07446738332509995
Validation loss = 0.06517089158296585
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -638      |
| Iteration     | 13        |
| MaximumReturn | 529       |
| MinimumReturn | -2.95e+03 |
| TotalSamples  | 60000     |
-----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06783922016620636
Validation loss = 0.06483883410692215
Validation loss = 0.06302978843450546
Validation loss = 0.06466905772686005
Validation loss = 0.06711515039205551
Validation loss = 0.0696335956454277
Validation loss = 0.06062209978699684
Validation loss = 0.06101972237229347
Validation loss = 0.06223205104470253
Validation loss = 0.07449419796466827
Validation loss = 0.06572457402944565
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07108890265226364
Validation loss = 0.06182004511356354
Validation loss = 0.06101280823349953
Validation loss = 0.05856898054480553
Validation loss = 0.06175810843706131
Validation loss = 0.07154308259487152
Validation loss = 0.0683358758687973
Validation loss = 0.05829278379678726
Validation loss = 0.05793434754014015
Validation loss = 0.06726657599210739
Validation loss = 0.05908536538481712
Validation loss = 0.06077553331851959
Validation loss = 0.06158604472875595
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06586580723524094
Validation loss = 0.0688297376036644
Validation loss = 0.06305649131536484
Validation loss = 0.06017196178436279
Validation loss = 0.06002906709909439
Validation loss = 0.06050557643175125
Validation loss = 0.070218525826931
Validation loss = 0.061736710369586945
Validation loss = 0.059381868690252304
Validation loss = 0.05988170579075813
Validation loss = 0.06234319135546684
Validation loss = 0.06064389646053314
Validation loss = 0.06519704312086105
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06848371773958206
Validation loss = 0.06440594792366028
Validation loss = 0.0655476301908493
Validation loss = 0.06446848064661026
Validation loss = 0.07683248817920685
Validation loss = 0.0692506805062294
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06910327076911926
Validation loss = 0.06198006495833397
Validation loss = 0.061693038791418076
Validation loss = 0.06272110342979431
Validation loss = 0.0674717128276825
Validation loss = 0.062256380915641785
Validation loss = 0.059522029012441635
Validation loss = 0.059149835258722305
Validation loss = 0.08041688054800034
Validation loss = 0.06231982633471489
Validation loss = 0.05994376167654991
Validation loss = 0.05913186073303223
Validation loss = 0.06336488574743271
Validation loss = 0.06616424024105072
Validation loss = 0.06032884865999222
Validation loss = 0.05817931517958641
Validation loss = 0.05954396352171898
Validation loss = 0.0672718808054924
Validation loss = 0.060566581785678864
Validation loss = 0.05848998203873634
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.23e+03 |
| Iteration     | 14        |
| MaximumReturn | -1.56e+03 |
| MinimumReturn | -2.72e+03 |
| TotalSamples  | 64000     |
-----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06754037737846375
Validation loss = 0.06314493715763092
Validation loss = 0.06126691773533821
Validation loss = 0.062113769352436066
Validation loss = 0.061595626175403595
Validation loss = 0.07454171031713486
Validation loss = 0.06252611428499222
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07049990445375443
Validation loss = 0.06168971210718155
Validation loss = 0.05771365016698837
Validation loss = 0.0569969117641449
Validation loss = 0.058062903583049774
Validation loss = 0.0626988410949707
Validation loss = 0.06133141741156578
Validation loss = 0.06090801581740379
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06626270711421967
Validation loss = 0.06363718211650848
Validation loss = 0.05799008905887604
Validation loss = 0.05842917412519455
Validation loss = 0.05811210721731186
Validation loss = 0.05805519223213196
Validation loss = 0.07478740066289902
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06776691973209381
Validation loss = 0.06338411569595337
Validation loss = 0.06276488304138184
Validation loss = 0.06154186278581619
Validation loss = 0.06682119518518448
Validation loss = 0.07171066105365753
Validation loss = 0.06510841101408005
Validation loss = 0.060550421476364136
Validation loss = 0.0625746101140976
Validation loss = 0.0650397539138794
Validation loss = 0.06593222916126251
Validation loss = 0.06895215064287186
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06723834574222565
Validation loss = 0.06072189286351204
Validation loss = 0.05843254178762436
Validation loss = 0.057998813688755035
Validation loss = 0.05843799561262131
Validation loss = 0.058980975300073624
Validation loss = 0.06827174872159958
Validation loss = 0.05690191686153412
Validation loss = 0.05728758126497269
Validation loss = 0.05646393448114395
Validation loss = 0.05804205685853958
Validation loss = 0.06277868151664734
Validation loss = 0.0627821832895279
Validation loss = 0.05809566751122475
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
-----------------------------
| AverageReturn | -2.75e+03 |
| Iteration     | 15        |
| MaximumReturn | -2.06e+03 |
| MinimumReturn | -3.01e+03 |
| TotalSamples  | 68000     |
-----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06381966173648834
Validation loss = 0.060427069664001465
Validation loss = 0.058284107595682144
Validation loss = 0.05947631224989891
Validation loss = 0.06490898132324219
Validation loss = 0.0625699833035469
Validation loss = 0.06566354632377625
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06794208288192749
Validation loss = 0.05928875505924225
Validation loss = 0.055972516536712646
Validation loss = 0.05828395485877991
Validation loss = 0.06215878203511238
Validation loss = 0.05973656103014946
Validation loss = 0.05570134520530701
Validation loss = 0.05466436222195625
Validation loss = 0.05549290403723717
Validation loss = 0.07111381739377975
Validation loss = 0.05786701664328575
Validation loss = 0.05530364438891411
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07096090167760849
Validation loss = 0.059338171035051346
Validation loss = 0.05662092566490173
Validation loss = 0.056516457349061966
Validation loss = 0.06384352594614029
Validation loss = 0.060124825686216354
Validation loss = 0.055502407252788544
Validation loss = 0.05619863048195839
Validation loss = 0.07266797125339508
Validation loss = 0.059673380106687546
Validation loss = 0.055376749485731125
Validation loss = 0.05642329901456833
Validation loss = 0.06497301906347275
Validation loss = 0.05659044533967972
Validation loss = 0.05989597737789154
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06980421394109726
Validation loss = 0.06038416922092438
Validation loss = 0.0589229017496109
Validation loss = 0.05918460711836815
Validation loss = 0.06446142494678497
Validation loss = 0.06830380111932755
Validation loss = 0.06127234175801277
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06322874873876572
Validation loss = 0.055968016386032104
Validation loss = 0.055901940912008286
Validation loss = 0.05520227923989296
Validation loss = 0.0652792677283287
Validation loss = 0.06059478968381882
Validation loss = 0.056514982134103775
Validation loss = 0.054275400936603546
Validation loss = 0.05454791709780693
Validation loss = 0.06375380605459213
Validation loss = 0.06356260180473328
Validation loss = 0.05577815696597099
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 335      |
| Iteration     | 16       |
| MaximumReturn | 1.91e+03 |
| MinimumReturn | -528     |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0711081326007843
Validation loss = 0.06778924912214279
Validation loss = 0.061236925423145294
Validation loss = 0.05964253842830658
Validation loss = 0.05831717699766159
Validation loss = 0.06944414228200912
Validation loss = 0.06152082234621048
Validation loss = 0.05862501636147499
Validation loss = 0.05740170180797577
Validation loss = 0.06396190822124481
Validation loss = 0.06569493561983109
Validation loss = 0.05969586595892906
Validation loss = 0.0572151355445385
Validation loss = 0.05991997942328453
Validation loss = 0.06298172473907471
Validation loss = 0.060670897364616394
Validation loss = 0.05674094706773758
Validation loss = 0.05559832602739334
Validation loss = 0.06051662936806679
Validation loss = 0.06636012345552444
Validation loss = 0.05802519619464874
Validation loss = 0.05512472242116928
Validation loss = 0.05565909668803215
Validation loss = 0.05873081460595131
Validation loss = 0.07966449856758118
Validation loss = 0.05576391518115997
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.08019927889108658
Validation loss = 0.06482450664043427
Validation loss = 0.05634002387523651
Validation loss = 0.05628516897559166
Validation loss = 0.05536368489265442
Validation loss = 0.05570297688245773
Validation loss = 0.06514998525381088
Validation loss = 0.06012799218297005
Validation loss = 0.05490803346037865
Validation loss = 0.05497416481375694
Validation loss = 0.061531729996204376
Validation loss = 0.06143902987241745
Validation loss = 0.0561540387570858
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07510218769311905
Validation loss = 0.05775686725974083
Validation loss = 0.05841170996427536
Validation loss = 0.055386122316122055
Validation loss = 0.054871875792741776
Validation loss = 0.0654236376285553
Validation loss = 0.060667768120765686
Validation loss = 0.054570335894823074
Validation loss = 0.05430947244167328
Validation loss = 0.05490696057677269
Validation loss = 0.05901026725769043
Validation loss = 0.061263833194971085
Validation loss = 0.05437378212809563
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.07253909856081009
Validation loss = 0.07053077220916748
Validation loss = 0.059511225670576096
Validation loss = 0.05921430140733719
Validation loss = 0.06003943458199501
Validation loss = 0.06483063101768494
Validation loss = 0.06043850630521774
Validation loss = 0.05936970189213753
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.08379045128822327
Validation loss = 0.05735606700181961
Validation loss = 0.05583282187581062
Validation loss = 0.0628528892993927
Validation loss = 0.05584624037146568
Validation loss = 0.055241867899894714
Validation loss = 0.058856502175331116
Validation loss = 0.06311678886413574
Validation loss = 0.06672999262809753
Validation loss = 0.05568940192461014
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 602      |
| Iteration     | 17       |
| MaximumReturn | 1.5e+03  |
| MinimumReturn | -362     |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06362567096948624
Validation loss = 0.05528147518634796
Validation loss = 0.056062109768390656
Validation loss = 0.05516490340232849
Validation loss = 0.0597846619784832
Validation loss = 0.06575272232294083
Validation loss = 0.05505145713686943
Validation loss = 0.05439852923154831
Validation loss = 0.05559423193335533
Validation loss = 0.06440293788909912
Validation loss = 0.05663536116480827
Validation loss = 0.05399893969297409
Validation loss = 0.05412882938981056
Validation loss = 0.06222524866461754
Validation loss = 0.05606422200798988
Validation loss = 0.05389472842216492
Validation loss = 0.05497067794203758
Validation loss = 0.0640641525387764
Validation loss = 0.05695385858416557
Validation loss = 0.05298027768731117
Validation loss = 0.054504405707120895
Validation loss = 0.05472702905535698
Validation loss = 0.05365728586912155
Validation loss = 0.06584081798791885
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06550826132297516
Validation loss = 0.05561433359980583
Validation loss = 0.054845601320266724
Validation loss = 0.05807153135538101
Validation loss = 0.06333237141370773
Validation loss = 0.06398668885231018
Validation loss = 0.055708497762680054
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06183098629117012
Validation loss = 0.05766997113823891
Validation loss = 0.05448102951049805
Validation loss = 0.055126894265413284
Validation loss = 0.0651751235127449
Validation loss = 0.05394858494400978
Validation loss = 0.05424664542078972
Validation loss = 0.052293065935373306
Validation loss = 0.055556248873472214
Validation loss = 0.06853850930929184
Validation loss = 0.05305081978440285
Validation loss = 0.05272158980369568
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0744527205824852
Validation loss = 0.0697728767991066
Validation loss = 0.06032289192080498
Validation loss = 0.05905614420771599
Validation loss = 0.06150323525071144
Validation loss = 0.06069807708263397
Validation loss = 0.06822646409273148
Validation loss = 0.06378105282783508
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06997662782669067
Validation loss = 0.05866139009594917
Validation loss = 0.055464595556259155
Validation loss = 0.0576862171292305
Validation loss = 0.06331442296504974
Validation loss = 0.05621954798698425
Validation loss = 0.06159837171435356
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 927      |
| Iteration     | 18       |
| MaximumReturn | 1.72e+03 |
| MinimumReturn | -435     |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06262379139661789
Validation loss = 0.05410361289978027
Validation loss = 0.05244245380163193
Validation loss = 0.05483498424291611
Validation loss = 0.06290940195322037
Validation loss = 0.055740416049957275
Validation loss = 0.055177927017211914
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07030279189348221
Validation loss = 0.05686306208372116
Validation loss = 0.05561954900622368
Validation loss = 0.05440698191523552
Validation loss = 0.06267526000738144
Validation loss = 0.059057414531707764
Validation loss = 0.05457227677106857
Validation loss = 0.05978352949023247
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06395827233791351
Validation loss = 0.05840621143579483
Validation loss = 0.05415504053235054
Validation loss = 0.05399158596992493
Validation loss = 0.05709473043680191
Validation loss = 0.05618112161755562
Validation loss = 0.05252514034509659
Validation loss = 0.057895660400390625
Validation loss = 0.053947605192661285
Validation loss = 0.05172188952565193
Validation loss = 0.05113546922802925
Validation loss = 0.06465493142604828
Validation loss = 0.050940074026584625
Validation loss = 0.05020364373922348
Validation loss = 0.05294080823659897
Validation loss = 0.05851559713482857
Validation loss = 0.05093126744031906
Validation loss = 0.049774907529354095
Validation loss = 0.05191176012158394
Validation loss = 0.05757569521665573
Validation loss = 0.05379422754049301
Validation loss = 0.04884844273328781
Validation loss = 0.048772867769002914
Validation loss = 0.056002408266067505
Validation loss = 0.05321761220693588
Validation loss = 0.04880918189883232
Validation loss = 0.05843881890177727
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0760682076215744
Validation loss = 0.061729222536087036
Validation loss = 0.06019807606935501
Validation loss = 0.05694662407040596
Validation loss = 0.06377173960208893
Validation loss = 0.06204051896929741
Validation loss = 0.06871088594198227
Validation loss = 0.05676255375146866
Validation loss = 0.056445617228746414
Validation loss = 0.05972273275256157
Validation loss = 0.06189341098070145
Validation loss = 0.05653364583849907
Validation loss = 0.055848538875579834
Validation loss = 0.05636388063430786
Validation loss = 0.06458574533462524
Validation loss = 0.05566859245300293
Validation loss = 0.055373240262269974
Validation loss = 0.06898046284914017
Validation loss = 0.056512635201215744
Validation loss = 0.05467195436358452
Validation loss = 0.05505790561437607
Validation loss = 0.06640523672103882
Validation loss = 0.05681934952735901
Validation loss = 0.05390089005231857
Validation loss = 0.057382117956876755
Validation loss = 0.07206561416387558
Validation loss = 0.05152691528201103
Validation loss = 0.052547864615917206
Validation loss = 0.05681391805410385
Validation loss = 0.05730419605970383
Validation loss = 0.05380599573254585
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0701797679066658
Validation loss = 0.05656303092837334
Validation loss = 0.05495046451687813
Validation loss = 0.05619894340634346
Validation loss = 0.05428494140505791
Validation loss = 0.06365098804235458
Validation loss = 0.055033475160598755
Validation loss = 0.05389776825904846
Validation loss = 0.058387868106365204
Validation loss = 0.07119186222553253
Validation loss = 0.05451212450861931
Validation loss = 0.05402282625436783
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.12e+03 |
| Iteration     | 19       |
| MaximumReturn | 1.74e+03 |
| MinimumReturn | 569      |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06146491318941116
Validation loss = 0.05176922678947449
Validation loss = 0.05849531665444374
Validation loss = 0.05700460448861122
Validation loss = 0.05066731944680214
Validation loss = 0.0521358922123909
Validation loss = 0.061065323650836945
Validation loss = 0.05109558254480362
Validation loss = 0.052615392953157425
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06858690083026886
Validation loss = 0.05422288551926613
Validation loss = 0.050392281264066696
Validation loss = 0.04997014254331589
Validation loss = 0.05349953845143318
Validation loss = 0.0569092258810997
Validation loss = 0.053023673593997955
Validation loss = 0.05116233974695206
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06289198994636536
Validation loss = 0.05878687649965286
Validation loss = 0.04680114984512329
Validation loss = 0.050632160156965256
Validation loss = 0.053547244518995285
Validation loss = 0.04605889320373535
Validation loss = 0.04655320569872856
Validation loss = 0.049296822398900986
Validation loss = 0.06755802780389786
Validation loss = 0.04777001589536667
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06165206432342529
Validation loss = 0.05133583024144173
Validation loss = 0.050787489861249924
Validation loss = 0.0502307191491127
Validation loss = 0.061404552310705185
Validation loss = 0.05331496521830559
Validation loss = 0.05279991775751114
Validation loss = 0.05088552460074425
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07407613843679428
Validation loss = 0.05399679020047188
Validation loss = 0.05135124549269676
Validation loss = 0.05204096809029579
Validation loss = 0.05312960594892502
Validation loss = 0.05910038203001022
Validation loss = 0.05453180521726608
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.16e+03 |
| Iteration     | 20       |
| MaximumReturn | 1.78e+03 |
| MinimumReturn | 148      |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06790623813867569
Validation loss = 0.058583274483680725
Validation loss = 0.04778064414858818
Validation loss = 0.04737468063831329
Validation loss = 0.05043653026223183
Validation loss = 0.05670798197388649
Validation loss = 0.04736211523413658
Validation loss = 0.046053338795900345
Validation loss = 0.059043824672698975
Validation loss = 0.05094977468252182
Validation loss = 0.048455704003572464
Validation loss = 0.044871263206005096
Validation loss = 0.04738346114754677
Validation loss = 0.0618976466357708
Validation loss = 0.04842603579163551
Validation loss = 0.045563291758298874
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06494057923555374
Validation loss = 0.048769138753414154
Validation loss = 0.047748252749443054
Validation loss = 0.06269598752260208
Validation loss = 0.05712169036269188
Validation loss = 0.048006344586610794
Validation loss = 0.04990034177899361
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.050318945199251175
Validation loss = 0.04695643484592438
Validation loss = 0.045878827571868896
Validation loss = 0.044814322143793106
Validation loss = 0.054177697747945786
Validation loss = 0.04565088823437691
Validation loss = 0.043734677135944366
Validation loss = 0.04412378370761871
Validation loss = 0.04559928551316261
Validation loss = 0.04603360593318939
Validation loss = 0.0430617481470108
Validation loss = 0.04260393604636192
Validation loss = 0.05438270419836044
Validation loss = 0.04533824697136879
Validation loss = 0.043378427624702454
Validation loss = 0.04375888779759407
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05684136971831322
Validation loss = 0.05023856461048126
Validation loss = 0.05059639364480972
Validation loss = 0.052094243466854095
Validation loss = 0.04700106754899025
Validation loss = 0.05095524713397026
Validation loss = 0.05530725419521332
Validation loss = 0.047582343220710754
Validation loss = 0.04592805728316307
Validation loss = 0.04877853766083717
Validation loss = 0.053497496992349625
Validation loss = 0.04620271921157837
Validation loss = 0.04581921175122261
Validation loss = 0.048322971910238266
Validation loss = 0.0518784373998642
Validation loss = 0.047718968242406845
Validation loss = 0.051667287945747375
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05743171274662018
Validation loss = 0.04939425364136696
Validation loss = 0.050523579120635986
Validation loss = 0.049511317163705826
Validation loss = 0.053326770663261414
Validation loss = 0.05693535506725311
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.5e+03  |
| Iteration     | 21       |
| MaximumReturn | 2.2e+03  |
| MinimumReturn | -202     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.054289113730192184
Validation loss = 0.048143673688173294
Validation loss = 0.04429105669260025
Validation loss = 0.047955647110939026
Validation loss = 0.04619264975190163
Validation loss = 0.04344755783677101
Validation loss = 0.044134024530649185
Validation loss = 0.05238478258252144
Validation loss = 0.04420208930969238
Validation loss = 0.04463529959321022
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06331080198287964
Validation loss = 0.04992220550775528
Validation loss = 0.04781733453273773
Validation loss = 0.04562116041779518
Validation loss = 0.05530475080013275
Validation loss = 0.0501711368560791
Validation loss = 0.046858835965394974
Validation loss = 0.04518837109208107
Validation loss = 0.06132512167096138
Validation loss = 0.04660196974873543
Validation loss = 0.04516531154513359
Validation loss = 0.04866768419742584
Validation loss = 0.050967443734407425
Validation loss = 0.04862165078520775
Validation loss = 0.04467892274260521
Validation loss = 0.04597923159599304
Validation loss = 0.05224735289812088
Validation loss = 0.05077538639307022
Validation loss = 0.04423345625400543
Validation loss = 0.053164128214120865
Validation loss = 0.049574609845876694
Validation loss = 0.04512777179479599
Validation loss = 0.0441126823425293
Validation loss = 0.05308689922094345
Validation loss = 0.04448535293340683
Validation loss = 0.04263509437441826
Validation loss = 0.042686592787504196
Validation loss = 0.054028090089559555
Validation loss = 0.04403422027826309
Validation loss = 0.042165204882621765
Validation loss = 0.043976303189992905
Validation loss = 0.05072856321930885
Validation loss = 0.04219846427440643
Validation loss = 0.042217276990413666
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05283878743648529
Validation loss = 0.04353875666856766
Validation loss = 0.04192936792969704
Validation loss = 0.04209733381867409
Validation loss = 0.05025037005543709
Validation loss = 0.042917899787425995
Validation loss = 0.04050329700112343
Validation loss = 0.040227316319942474
Validation loss = 0.04186023026704788
Validation loss = 0.04593956097960472
Validation loss = 0.040624361485242844
Validation loss = 0.041229598224163055
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05984460189938545
Validation loss = 0.04685148969292641
Validation loss = 0.05143746733665466
Validation loss = 0.044042039662599564
Validation loss = 0.04463246464729309
Validation loss = 0.051602549850940704
Validation loss = 0.044428907334804535
Validation loss = 0.04361758381128311
Validation loss = 0.06601881980895996
Validation loss = 0.043224748224020004
Validation loss = 0.04173319786787033
Validation loss = 0.06080939620733261
Validation loss = 0.044448018074035645
Validation loss = 0.045275021344423294
Validation loss = 0.05246457830071449
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05724349617958069
Validation loss = 0.04813198000192642
Validation loss = 0.04662422090768814
Validation loss = 0.04888569191098213
Validation loss = 0.04770803451538086
Validation loss = 0.046321336179971695
Validation loss = 0.056724563241004944
Validation loss = 0.04882686957716942
Validation loss = 0.045931022614240646
Validation loss = 0.05171426013112068
Validation loss = 0.05261869356036186
Validation loss = 0.046871624886989594
Validation loss = 0.046222224831581116
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.21e+03 |
| Iteration     | 22       |
| MaximumReturn | 2.02e+03 |
| MinimumReturn | 235      |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0573294572532177
Validation loss = 0.045016516000032425
Validation loss = 0.0426742285490036
Validation loss = 0.04339165985584259
Validation loss = 0.050304073840379715
Validation loss = 0.04316354915499687
Validation loss = 0.04172812029719353
Validation loss = 0.04384009912610054
Validation loss = 0.05478382110595703
Validation loss = 0.04291422665119171
Validation loss = 0.04234490916132927
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.056809138506650925
Validation loss = 0.04559671878814697
Validation loss = 0.041814666241407394
Validation loss = 0.0403398834168911
Validation loss = 0.044893208891153336
Validation loss = 0.04176954925060272
Validation loss = 0.03997233510017395
Validation loss = 0.041235242038965225
Validation loss = 0.05393312871456146
Validation loss = 0.04184814170002937
Validation loss = 0.03928793966770172
Validation loss = 0.038640279322862625
Validation loss = 0.04082491621375084
Validation loss = 0.052961211651563644
Validation loss = 0.038374971598386765
Validation loss = 0.038998525589704514
Validation loss = 0.048694904893636703
Validation loss = 0.03862191364169121
Validation loss = 0.039634525775909424
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05374782904982567
Validation loss = 0.04272923246026039
Validation loss = 0.04082908108830452
Validation loss = 0.0471319742500782
Validation loss = 0.0411730520427227
Validation loss = 0.039016205817461014
Validation loss = 0.04029027745127678
Validation loss = 0.04651949927210808
Validation loss = 0.03874863684177399
Validation loss = 0.03931954875588417
Validation loss = 0.04023848846554756
Validation loss = 0.0496809296309948
Validation loss = 0.03940172865986824
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.055685531347990036
Validation loss = 0.044769834727048874
Validation loss = 0.042047169059515
Validation loss = 0.04934091866016388
Validation loss = 0.04186415672302246
Validation loss = 0.05063137412071228
Validation loss = 0.044060949236154556
Validation loss = 0.04167425259947777
Validation loss = 0.04337593540549278
Validation loss = 0.05144235119223595
Validation loss = 0.04224536940455437
Validation loss = 0.053231846541166306
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.054035842418670654
Validation loss = 0.05954842269420624
Validation loss = 0.04559260606765747
Validation loss = 0.045458611100912094
Validation loss = 0.04482673481106758
Validation loss = 0.050312042236328125
Validation loss = 0.045507367700338364
Validation loss = 0.0436076819896698
Validation loss = 0.0456637479364872
Validation loss = 0.048582524061203
Validation loss = 0.04409240186214447
Validation loss = 0.059679996222257614
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 576      |
| Iteration     | 23       |
| MaximumReturn | 1.66e+03 |
| MinimumReturn | -69      |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05422115698456764
Validation loss = 0.045609939843416214
Validation loss = 0.04049161076545715
Validation loss = 0.043164558708667755
Validation loss = 0.05695968121290207
Validation loss = 0.04119998961687088
Validation loss = 0.040989793837070465
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04828089103102684
Validation loss = 0.03945572301745415
Validation loss = 0.04119011014699936
Validation loss = 0.046869296580553055
Validation loss = 0.0380227155983448
Validation loss = 0.03696589916944504
Validation loss = 0.04406087473034859
Validation loss = 0.047281619161367416
Validation loss = 0.0391068160533905
Validation loss = 0.03870771825313568
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05031704530119896
Validation loss = 0.04486655071377754
Validation loss = 0.04460110142827034
Validation loss = 0.03933717682957649
Validation loss = 0.05068143457174301
Validation loss = 0.040361545979976654
Validation loss = 0.039241723716259
Validation loss = 0.03821660205721855
Validation loss = 0.050047241151332855
Validation loss = 0.03821924328804016
Validation loss = 0.038601137697696686
Validation loss = 0.037755027413368225
Validation loss = 0.04978443309664726
Validation loss = 0.04155104607343674
Validation loss = 0.038023851811885834
Validation loss = 0.03710571676492691
Validation loss = 0.04424896091222763
Validation loss = 0.04174034297466278
Validation loss = 0.03616739436984062
Validation loss = 0.03612028807401657
Validation loss = 0.041364092379808426
Validation loss = 0.03775676712393761
Validation loss = 0.036390699446201324
Validation loss = 0.06014186888933182
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06256616115570068
Validation loss = 0.044974975287914276
Validation loss = 0.04040394723415375
Validation loss = 0.04194829612970352
Validation loss = 0.04457351565361023
Validation loss = 0.042903508991003036
Validation loss = 0.04077690467238426
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05896029621362686
Validation loss = 0.044393185526132584
Validation loss = 0.04187016561627388
Validation loss = 0.04123617708683014
Validation loss = 0.04642459750175476
Validation loss = 0.04415744170546532
Validation loss = 0.040906280279159546
Validation loss = 0.041650403290987015
Validation loss = 0.046396661549806595
Validation loss = 0.04553091898560524
Validation loss = 0.04079482704401016
Validation loss = 0.040539730340242386
Validation loss = 0.0468338280916214
Validation loss = 0.04155975952744484
Validation loss = 0.03944222256541252
Validation loss = 0.04534602537751198
Validation loss = 0.04313070699572563
Validation loss = 0.03917839378118515
Validation loss = 0.04788492992520332
Validation loss = 0.04110431671142578
Validation loss = 0.04018070176243782
Validation loss = 0.03902367502450943
Validation loss = 0.05488346889615059
Validation loss = 0.04176151752471924
Validation loss = 0.03804350271821022
Validation loss = 0.040277205407619476
Validation loss = 0.042491815984249115
Validation loss = 0.03982515260577202
Validation loss = 0.038060080260038376
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.06e+03 |
| Iteration     | 24       |
| MaximumReturn | 2.14e+03 |
| MinimumReturn | 180      |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04823094978928566
Validation loss = 0.041114337742328644
Validation loss = 0.04626442864537239
Validation loss = 0.04550107568502426
Validation loss = 0.03945942223072052
Validation loss = 0.040659334510564804
Validation loss = 0.04485861957073212
Validation loss = 0.04304305091500282
Validation loss = 0.03809447959065437
Validation loss = 0.03954116255044937
Validation loss = 0.039661068469285965
Validation loss = 0.03852337598800659
Validation loss = 0.04171460494399071
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.058791883289813995
Validation loss = 0.03796736150979996
Validation loss = 0.036821309477090836
Validation loss = 0.04030605033040047
Validation loss = 0.03971482813358307
Validation loss = 0.03787361830472946
Validation loss = 0.03641872853040695
Validation loss = 0.03571883589029312
Validation loss = 0.045358236879110336
Validation loss = 0.03994661197066307
Validation loss = 0.035495858639478683
Validation loss = 0.03587479516863823
Validation loss = 0.038593512028455734
Validation loss = 0.034879229962825775
Validation loss = 0.036544978618621826
Validation loss = 0.04016292467713356
Validation loss = 0.04555005207657814
Validation loss = 0.03572383150458336
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.054603829979896545
Validation loss = 0.03588531166315079
Validation loss = 0.035228364169597626
Validation loss = 0.03991124406456947
Validation loss = 0.03958809748291969
Validation loss = 0.03492741659283638
Validation loss = 0.035407163202762604
Validation loss = 0.041016701608896255
Validation loss = 0.039840519428253174
Validation loss = 0.037206824868917465
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04973187670111656
Validation loss = 0.04251129925251007
Validation loss = 0.040521856397390366
Validation loss = 0.0405099019408226
Validation loss = 0.05135780945420265
Validation loss = 0.041124679148197174
Validation loss = 0.0386553592979908
Validation loss = 0.04016707092523575
Validation loss = 0.03926270827651024
Validation loss = 0.03751624375581741
Validation loss = 0.04942210763692856
Validation loss = 0.048859842121601105
Validation loss = 0.03717333823442459
Validation loss = 0.03743341937661171
Validation loss = 0.04745487868785858
Validation loss = 0.03882315009832382
Validation loss = 0.03789864480495453
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05129692330956459
Validation loss = 0.04015939682722092
Validation loss = 0.03655696287751198
Validation loss = 0.03944512829184532
Validation loss = 0.0435248538851738
Validation loss = 0.038761384785175323
Validation loss = 0.03693097084760666
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.26e+03 |
| Iteration     | 25       |
| MaximumReturn | 1.41e+03 |
| MinimumReturn | 1.15e+03 |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.059562280774116516
Validation loss = 0.03822058066725731
Validation loss = 0.03711061179637909
Validation loss = 0.03685219958424568
Validation loss = 0.04267001897096634
Validation loss = 0.038004226982593536
Validation loss = 0.0360913947224617
Validation loss = 0.03937932848930359
Validation loss = 0.03687431290745735
Validation loss = 0.04135490953922272
Validation loss = 0.036372844129800797
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04119915887713432
Validation loss = 0.0354926623404026
Validation loss = 0.03722427040338516
Validation loss = 0.0343838632106781
Validation loss = 0.04597477614879608
Validation loss = 0.037303730845451355
Validation loss = 0.03312743082642555
Validation loss = 0.03387243673205376
Validation loss = 0.03736420348286629
Validation loss = 0.038182828575372696
Validation loss = 0.03431877866387367
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.041164133697748184
Validation loss = 0.03368504345417023
Validation loss = 0.0334150530397892
Validation loss = 0.05015219748020172
Validation loss = 0.033154264092445374
Validation loss = 0.03258189186453819
Validation loss = 0.04285778850317001
Validation loss = 0.03340298682451248
Validation loss = 0.032851532101631165
Validation loss = 0.03437703102827072
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05155158042907715
Validation loss = 0.03733157366514206
Validation loss = 0.03774772211909294
Validation loss = 0.03731097653508186
Validation loss = 0.03381936997175217
Validation loss = 0.0369076170027256
Validation loss = 0.047863882035017014
Validation loss = 0.03760590776801109
Validation loss = 0.034275490790605545
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05115939676761627
Validation loss = 0.03754011169075966
Validation loss = 0.03585128113627434
Validation loss = 0.03596782684326172
Validation loss = 0.03760475292801857
Validation loss = 0.03493550792336464
Validation loss = 0.037669915705919266
Validation loss = 0.04200642555952072
Validation loss = 0.03435438126325607
Validation loss = 0.03479834273457527
Validation loss = 0.03942292556166649
Validation loss = 0.035233039408922195
Validation loss = 0.03434928134083748
Validation loss = 0.03745248168706894
Validation loss = 0.035440593957901
Validation loss = 0.03462957963347435
Validation loss = 0.047013141214847565
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.6e+03  |
| Iteration     | 26       |
| MaximumReturn | 2.1e+03  |
| MinimumReturn | 488      |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04028475657105446
Validation loss = 0.034865133464336395
Validation loss = 0.03474348038434982
Validation loss = 0.05858074501156807
Validation loss = 0.037001412361860275
Validation loss = 0.034201089292764664
Validation loss = 0.03407710790634155
Validation loss = 0.05316958948969841
Validation loss = 0.03459305316209793
Validation loss = 0.03321849927306175
Validation loss = 0.03705339878797531
Validation loss = 0.03675038367509842
Validation loss = 0.03744712099432945
Validation loss = 0.03282412141561508
Validation loss = 0.03329608216881752
Validation loss = 0.04266330227255821
Validation loss = 0.03264840692281723
Validation loss = 0.032626181840896606
Validation loss = 0.039912786334753036
Validation loss = 0.03363758325576782
Validation loss = 0.03127710893750191
Validation loss = 0.04209662601351738
Validation loss = 0.03388022258877754
Validation loss = 0.03233855217695236
Validation loss = 0.03142738714814186
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04109874740242958
Validation loss = 0.037872880697250366
Validation loss = 0.033236052840948105
Validation loss = 0.031533945351839066
Validation loss = 0.04054361581802368
Validation loss = 0.03345385566353798
Validation loss = 0.033638663589954376
Validation loss = 0.035494089126586914
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.039694275707006454
Validation loss = 0.035834185779094696
Validation loss = 0.0313216857612133
Validation loss = 0.033586256206035614
Validation loss = 0.035842638462781906
Validation loss = 0.04040216654539108
Validation loss = 0.0314236655831337
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.041944362223148346
Validation loss = 0.034882962703704834
Validation loss = 0.036648161709308624
Validation loss = 0.032932888716459274
Validation loss = 0.0395553782582283
Validation loss = 0.04553655534982681
Validation loss = 0.03335129842162132
Validation loss = 0.04309055581688881
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03930027037858963
Validation loss = 0.033944521099328995
Validation loss = 0.03352061286568642
Validation loss = 0.03633933514356613
Validation loss = 0.03374455124139786
Validation loss = 0.032558951526880264
Validation loss = 0.03799126297235489
Validation loss = 0.05129266157746315
Validation loss = 0.03224016726016998
Validation loss = 0.03190390020608902
Validation loss = 0.03613275662064552
Validation loss = 0.03402705863118172
Validation loss = 0.03180525824427605
Validation loss = 0.03742523863911629
Validation loss = 0.03270455077290535
Validation loss = 0.03125761076807976
Validation loss = 0.03687816113233566
Validation loss = 0.03437890112400055
Validation loss = 0.03593336418271065
Validation loss = 0.03809598460793495
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 851      |
| Iteration     | 27       |
| MaximumReturn | 1.66e+03 |
| MinimumReturn | -352     |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04331032186746597
Validation loss = 0.03233339637517929
Validation loss = 0.030851541087031364
Validation loss = 0.03253912553191185
Validation loss = 0.04380391538143158
Validation loss = 0.03311290219426155
Validation loss = 0.03058687411248684
Validation loss = 0.03283831477165222
Validation loss = 0.03485732898116112
Validation loss = 0.0304240845143795
Validation loss = 0.031577445566654205
Validation loss = 0.03343212604522705
Validation loss = 0.029900647699832916
Validation loss = 0.029298879206180573
Validation loss = 0.036753471940755844
Validation loss = 0.03153049573302269
Validation loss = 0.02963450364768505
Validation loss = 0.034982830286026
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04003527760505676
Validation loss = 0.032449737191200256
Validation loss = 0.04017682746052742
Validation loss = 0.031952135264873505
Validation loss = 0.03172881156206131
Validation loss = 0.03692755475640297
Validation loss = 0.043122753500938416
Validation loss = 0.030246375128626823
Validation loss = 0.031614985316991806
Validation loss = 0.03108585625886917
Validation loss = 0.040137384086847305
Validation loss = 0.03117228113114834
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03381853923201561
Validation loss = 0.031830571591854095
Validation loss = 0.03991033136844635
Validation loss = 0.03414923697710037
Validation loss = 0.03160408139228821
Validation loss = 0.03320921212434769
Validation loss = 0.030732177197933197
Validation loss = 0.03259582445025444
Validation loss = 0.030520835891366005
Validation loss = 0.03380230814218521
Validation loss = 0.03054960072040558
Validation loss = 0.029375998303294182
Validation loss = 0.03237124904990196
Validation loss = 0.03259469196200371
Validation loss = 0.032108161598443985
Validation loss = 0.0327579565346241
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0388614647090435
Validation loss = 0.03295445814728737
Validation loss = 0.03329789638519287
Validation loss = 0.031462810933589935
Validation loss = 0.03800110146403313
Validation loss = 0.03348059952259064
Validation loss = 0.03095223754644394
Validation loss = 0.03860294073820114
Validation loss = 0.03609873354434967
Validation loss = 0.03119354136288166
Validation loss = 0.03149176016449928
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.037374503910541534
Validation loss = 0.03205626457929611
Validation loss = 0.031909264624118805
Validation loss = 0.03739417344331741
Validation loss = 0.032938819378614426
Validation loss = 0.030018089339137077
Validation loss = 0.038275204598903656
Validation loss = 0.030763328075408936
Validation loss = 0.030363086611032486
Validation loss = 0.034826524555683136
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.22e+03 |
| Iteration     | 28       |
| MaximumReturn | 2.26e+03 |
| MinimumReturn | 392      |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03954978287220001
Validation loss = 0.029981279745697975
Validation loss = 0.02774408832192421
Validation loss = 0.03072419948875904
Validation loss = 0.028872467577457428
Validation loss = 0.028160996735095978
Validation loss = 0.029088608920574188
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.036296237260103226
Validation loss = 0.030815036967396736
Validation loss = 0.029872598126530647
Validation loss = 0.03272752836346626
Validation loss = 0.03227059543132782
Validation loss = 0.031497109681367874
Validation loss = 0.03528124839067459
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.036012597382068634
Validation loss = 0.029185818508267403
Validation loss = 0.02778366394340992
Validation loss = 0.0315186083316803
Validation loss = 0.032926786690950394
Validation loss = 0.027545342221856117
Validation loss = 0.029069596901535988
Validation loss = 0.028414392843842506
Validation loss = 0.029012972488999367
Validation loss = 0.029490860179066658
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04124272242188454
Validation loss = 0.031843945384025574
Validation loss = 0.02935199998319149
Validation loss = 0.029644018039107323
Validation loss = 0.03610309585928917
Validation loss = 0.031233450397849083
Validation loss = 0.03317360579967499
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03971518203616142
Validation loss = 0.02879432961344719
Validation loss = 0.02966095693409443
Validation loss = 0.030451444908976555
Validation loss = 0.03257254511117935
Validation loss = 0.02892259880900383
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.15e+03 |
| Iteration     | 29       |
| MaximumReturn | 1.41e+03 |
| MinimumReturn | 966      |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03414285182952881
Validation loss = 0.030742580071091652
Validation loss = 0.027448730543255806
Validation loss = 0.03849022835493088
Validation loss = 0.02652796544134617
Validation loss = 0.026572858914732933
Validation loss = 0.03101896122097969
Validation loss = 0.02763843908905983
Validation loss = 0.026308786123991013
Validation loss = 0.03222157061100006
Validation loss = 0.025702737271785736
Validation loss = 0.02861929126083851
Validation loss = 0.028389882296323776
Validation loss = 0.025506891310214996
Validation loss = 0.03129901364445686
Validation loss = 0.025917749851942062
Validation loss = 0.0259789377450943
Validation loss = 0.026119861751794815
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03181036189198494
Validation loss = 0.028815900906920433
Validation loss = 0.03443174809217453
Validation loss = 0.030861079692840576
Validation loss = 0.03184257075190544
Validation loss = 0.028023986145853996
Validation loss = 0.02817494235932827
Validation loss = 0.029248632490634918
Validation loss = 0.030384574085474014
Validation loss = 0.028030026704072952
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03262433409690857
Validation loss = 0.027350610122084618
Validation loss = 0.02687862142920494
Validation loss = 0.02910495735704899
Validation loss = 0.028894418850541115
Validation loss = 0.028121082112193108
Validation loss = 0.02921304851770401
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.033021729439496994
Validation loss = 0.031555622816085815
Validation loss = 0.0317755825817585
Validation loss = 0.02821885421872139
Validation loss = 0.03038320131599903
Validation loss = 0.03009711764752865
Validation loss = 0.028965478762984276
Validation loss = 0.028856324031949043
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03724062815308571
Validation loss = 0.03289666399359703
Validation loss = 0.028572212904691696
Validation loss = 0.028602655977010727
Validation loss = 0.0360308513045311
Validation loss = 0.027785135433077812
Validation loss = 0.02724277228116989
Validation loss = 0.02853523939847946
Validation loss = 0.03856496140360832
Validation loss = 0.027022946625947952
Validation loss = 0.02769145928323269
Validation loss = 0.034743502736091614
Validation loss = 0.029497595503926277
Validation loss = 0.027262773364782333
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.58e+03 |
| Iteration     | 30       |
| MaximumReturn | 2.21e+03 |
| MinimumReturn | 1.02e+03 |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029871895909309387
Validation loss = 0.028407994657754898
Validation loss = 0.025452036410570145
Validation loss = 0.02488790825009346
Validation loss = 0.027109133079648018
Validation loss = 0.025717683136463165
Validation loss = 0.028044739738106728
Validation loss = 0.024155359715223312
Validation loss = 0.029593156650662422
Validation loss = 0.02502877451479435
Validation loss = 0.02388017624616623
Validation loss = 0.026125473901629448
Validation loss = 0.0292121060192585
Validation loss = 0.02976832538843155
Validation loss = 0.02413906529545784
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04230139032006264
Validation loss = 0.030100025236606598
Validation loss = 0.02817710116505623
Validation loss = 0.03011263906955719
Validation loss = 0.030002791434526443
Validation loss = 0.03515585511922836
Validation loss = 0.02786070853471756
Validation loss = 0.02661733329296112
Validation loss = 0.02824738435447216
Validation loss = 0.029561465606093407
Validation loss = 0.026224873960018158
Validation loss = 0.038003090769052505
Validation loss = 0.02804289013147354
Validation loss = 0.026190202683210373
Validation loss = 0.026648079976439476
Validation loss = 0.02754201740026474
Validation loss = 0.02718675136566162
Validation loss = 0.027015825733542442
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03454437106847763
Validation loss = 0.027750791981816292
Validation loss = 0.027124658226966858
Validation loss = 0.029065750539302826
Validation loss = 0.027813829481601715
Validation loss = 0.035489536821842194
Validation loss = 0.026354607194662094
Validation loss = 0.027802474796772003
Validation loss = 0.02888059988617897
Validation loss = 0.025112617760896683
Validation loss = 0.027660362422466278
Validation loss = 0.02935798466205597
Validation loss = 0.025300316512584686
Validation loss = 0.027611689642071724
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04759382829070091
Validation loss = 0.02877805382013321
Validation loss = 0.029224980622529984
Validation loss = 0.027337346225976944
Validation loss = 0.030248112976551056
Validation loss = 0.02669381909072399
Validation loss = 0.03686550259590149
Validation loss = 0.029186610132455826
Validation loss = 0.026459749788045883
Validation loss = 0.028396612033247948
Validation loss = 0.032112911343574524
Validation loss = 0.02631372958421707
Validation loss = 0.02635667659342289
Validation loss = 0.027359846979379654
Validation loss = 0.025305330753326416
Validation loss = 0.02796715684235096
Validation loss = 0.02783471718430519
Validation loss = 0.026780717074871063
Validation loss = 0.03488893806934357
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03446592018008232
Validation loss = 0.02693016268312931
Validation loss = 0.027198651805520058
Validation loss = 0.028954774141311646
Validation loss = 0.027715008705854416
Validation loss = 0.03420977294445038
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.47e+03 |
| Iteration     | 31       |
| MaximumReturn | 2.28e+03 |
| MinimumReturn | 891      |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.026413308456540108
Validation loss = 0.026189450174570084
Validation loss = 0.024190591648221016
Validation loss = 0.027340078726410866
Validation loss = 0.022558733820915222
Validation loss = 0.02347012795507908
Validation loss = 0.02716599963605404
Validation loss = 0.023010259494185448
Validation loss = 0.023559246212244034
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.035493701696395874
Validation loss = 0.026568515226244926
Validation loss = 0.025006134063005447
Validation loss = 0.02909783087670803
Validation loss = 0.02531525306403637
Validation loss = 0.025304676964879036
Validation loss = 0.0318765752017498
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04317045584321022
Validation loss = 0.024688035249710083
Validation loss = 0.024128254503011703
Validation loss = 0.02978461980819702
Validation loss = 0.025582902133464813
Validation loss = 0.02465600147843361
Validation loss = 0.026933100074529648
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03159031271934509
Validation loss = 0.025141516700387
Validation loss = 0.0331052765250206
Validation loss = 0.024912739172577858
Validation loss = 0.030419422313570976
Validation loss = 0.02561524137854576
Validation loss = 0.024637801572680473
Validation loss = 0.031741853803396225
Validation loss = 0.024350905790925026
Validation loss = 0.0250958651304245
Validation loss = 0.031753405928611755
Validation loss = 0.025963755324482918
Validation loss = 0.02426386997103691
Validation loss = 0.03182166814804077
Validation loss = 0.0246586874127388
Validation loss = 0.024949580430984497
Validation loss = 0.03238793835043907
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03403666988015175
Validation loss = 0.027694670483469963
Validation loss = 0.025243226438760757
Validation loss = 0.027784788981080055
Validation loss = 0.033794939517974854
Validation loss = 0.025294899940490723
Validation loss = 0.02624230831861496
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 941      |
| Iteration     | 32       |
| MaximumReturn | 1.28e+03 |
| MinimumReturn | 608      |
| TotalSamples  | 136000   |
----------------------------
