Logging to experiments/half_cheetah/test-exp-dir-2/test-exp2_seed2314
Print configuration .....
{'max_val_data': 100000, 'dynamics': {'kfac_params': {'damping': 0.001, 'cov_ema_decay': 0.99, 'momentum': 0.9, 'learning_rate': 0.1, 'kl_clip': 0.0001}, 'intrinsic_reward_only': False, 'enable_particle_ensemble': True, 'external_reward_evaluation_interval': 5, 'mode': 'random', 'batch_size': 1000, 'ensemble_model_count': 5, 'particles': 5, 'activation': 'relu', 'n_layers': 4, 'val': True, 'ensemble': True, 'hidden_size': 1000, 'intrinsic_reward_coeff': 1.0, 'learning_rate': 0.001, 'epochs': 200, 'ita': 1.0, 'pre_training': {'policy_itr': 20, 'mode': 'intrinsic_reward', 'itr': 0}, 'model': 'nn', 'obs_var': 1.0}, 'random_seeds': [4321, 2314, 2341, 3421], 'max_train_data': 200000, 'env_horizon': 1000, 'num_path_random': 6, 'discard_ratio': 0.0, 'start_onpol_iter': 0, 'num_path_onpol': 6, 'onpol_iters': 33, 'env_name': 'half_cheetah', 'trpo': {'gae': 0.95, 'batch_size': 50000, 'iterations': 40, 'step_size': 0.01, 'horizon': 1000, 'gamma': 0.99}, 'algo': 'trpo', 'policy': {'init_logstd': 0.0, 'reinitialize_every_itr': False, 'activation': 'tanh', 'network_shape': [32, 32]}, 'trpo_ext_reward': {'gae': 0.95, 'batch_size': 50000, 'iterations': 20, 'step_size': 0.01, 'horizon': 1000, 'gamma': 0.99}, 'save_variables': False, 'restore_variables': False, 'model_save_dir': '/tmp/half_cheetah_models/'}
Generating random rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating random rollouts.
Creating normalization for training data.
Done creating normalization for training data.
Particle ensemble enabled? True
An ensemble of 5 dynamics model <class 'model.dynamics.NNDynamicsModel'> initialized
Train dynamics model with intrinsic reward only? False
Pre-training enabled. Using only intrinsic reward.
Pre-training dynamics model for 0 iterations...
Done pre-training dynamics model.
Using external reward only.
itr #0 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.5433571934700012
Validation loss = 0.1441236138343811
Validation loss = 0.09628530591726303
Validation loss = 0.07900580018758774
Validation loss = 0.07102126628160477
Validation loss = 0.06628434360027313
Validation loss = 0.06360699981451035
Validation loss = 0.061419300734996796
Validation loss = 0.06579656153917313
Validation loss = 0.06925447285175323
Validation loss = 0.05648695304989815
Validation loss = 0.054596319794654846
Validation loss = 0.056163884699344635
Validation loss = 0.05349064618349075
Validation loss = 0.06887628138065338
Validation loss = 0.05183379724621773
Validation loss = 0.05189177766442299
Validation loss = 0.05284704640507698
Validation loss = 0.053969137370586395
Validation loss = 0.052103519439697266
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.39637821912765503
Validation loss = 0.14502835273742676
Validation loss = 0.0946582555770874
Validation loss = 0.07741739600896835
Validation loss = 0.07577686756849289
Validation loss = 0.06614120304584503
Validation loss = 0.05981322377920151
Validation loss = 0.06147284433245659
Validation loss = 0.06094355881214142
Validation loss = 0.056563012301921844
Validation loss = 0.06127358227968216
Validation loss = 0.06314919888973236
Validation loss = 0.05423051863908768
Validation loss = 0.09349751472473145
Validation loss = 0.053760990500450134
Validation loss = 0.05536209046840668
Validation loss = 0.05305986851453781
Validation loss = 0.06668916344642639
Validation loss = 0.05358181893825531
Validation loss = 0.049944646656513214
Validation loss = 0.05193377286195755
Validation loss = 0.05093679949641228
Validation loss = 0.06084524467587471
Validation loss = 0.050057873129844666
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.4469841718673706
Validation loss = 0.15842324495315552
Validation loss = 0.10474202036857605
Validation loss = 0.0826781690120697
Validation loss = 0.07301361113786697
Validation loss = 0.06589147448539734
Validation loss = 0.0630275085568428
Validation loss = 0.06143660470843315
Validation loss = 0.05947544053196907
Validation loss = 0.060364268720149994
Validation loss = 0.06936918944120407
Validation loss = 0.056094083935022354
Validation loss = 0.052061066031455994
Validation loss = 0.052297815680503845
Validation loss = 0.05343897268176079
Validation loss = 0.0519426129758358
Validation loss = 0.054274287074804306
Validation loss = 0.0503317192196846
Validation loss = 0.051870882511138916
Validation loss = 0.051648423075675964
Validation loss = 0.05360398441553116
Validation loss = 0.05190180614590645
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.43430230021476746
Validation loss = 0.14270730316638947
Validation loss = 0.09246823191642761
Validation loss = 0.07721602916717529
Validation loss = 0.07057900726795197
Validation loss = 0.06992314755916595
Validation loss = 0.06301800906658173
Validation loss = 0.061415791511535645
Validation loss = 0.05934041738510132
Validation loss = 0.06536629796028137
Validation loss = 0.057510003447532654
Validation loss = 0.05367518216371536
Validation loss = 0.056262772530317307
Validation loss = 0.060934245586395264
Validation loss = 0.0510900616645813
Validation loss = 0.05126577615737915
Validation loss = 0.06142006441950798
Validation loss = 0.05138135701417923
Validation loss = 0.05739418417215347
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.47849661111831665
Validation loss = 0.1453450620174408
Validation loss = 0.0955837219953537
Validation loss = 0.07954490184783936
Validation loss = 0.07849188148975372
Validation loss = 0.06533772498369217
Validation loss = 0.06173901632428169
Validation loss = 0.06266017258167267
Validation loss = 0.060062579810619354
Validation loss = 0.05567613244056702
Validation loss = 0.062382861971855164
Validation loss = 0.05710216611623764
Validation loss = 0.053759150207042694
Validation loss = 0.08772377669811249
Validation loss = 0.05612066760659218
Validation loss = 0.052173737436532974
Validation loss = 0.05511714518070221
Validation loss = 0.05080608278512955
Validation loss = 0.06693019717931747
Validation loss = 0.050654858350753784
Validation loss = 0.049097754061222076
Validation loss = 0.051264576613903046
Validation loss = 0.04953089728951454
Validation loss = 0.05870632827281952
Validation loss = 0.05078992620110512
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -339     |
| Iteration     | 0        |
| MaximumReturn | -287     |
| MinimumReturn | -430     |
| TotalSamples  | 8000     |
----------------------------
itr #1 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1299269199371338
Validation loss = 0.07580731064081192
Validation loss = 0.07278771698474884
Validation loss = 0.0677109956741333
Validation loss = 0.06411904096603394
Validation loss = 0.06530290842056274
Validation loss = 0.05914229899644852
Validation loss = 0.0596776157617569
Validation loss = 0.060511209070682526
Validation loss = 0.057063013315200806
Validation loss = 0.06353354454040527
Validation loss = 0.05744116008281708
Validation loss = 0.05830197036266327
Validation loss = 0.057009853422641754
Validation loss = 0.05623096972703934
Validation loss = 0.058186471462249756
Validation loss = 0.05532464757561684
Validation loss = 0.05536753684282303
Validation loss = 0.05697998031973839
Validation loss = 0.055290915071964264
Validation loss = 0.058389052748680115
Validation loss = 0.05469643324613571
Validation loss = 0.05487726256251335
Validation loss = 0.0610194206237793
Validation loss = 0.056776151061058044
Validation loss = 0.057762742042541504
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.12886902689933777
Validation loss = 0.07706877589225769
Validation loss = 0.07014855742454529
Validation loss = 0.06671729683876038
Validation loss = 0.06722806394100189
Validation loss = 0.06698289513587952
Validation loss = 0.06122853606939316
Validation loss = 0.0585760623216629
Validation loss = 0.060166917741298676
Validation loss = 0.059371381998062134
Validation loss = 0.05717400088906288
Validation loss = 0.05758242309093475
Validation loss = 0.05767466872930527
Validation loss = 0.05577307939529419
Validation loss = 0.060798726975917816
Validation loss = 0.055744342505931854
Validation loss = 0.05714384466409683
Validation loss = 0.05505000054836273
Validation loss = 0.05705486983060837
Validation loss = 0.054475851356983185
Validation loss = 0.057333312928676605
Validation loss = 0.055721163749694824
Validation loss = 0.055915772914886475
Validation loss = 0.05818675458431244
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.12149648368358612
Validation loss = 0.07577764987945557
Validation loss = 0.06993871927261353
Validation loss = 0.06833466142416
Validation loss = 0.06316806375980377
Validation loss = 0.07536791265010834
Validation loss = 0.06858818978071213
Validation loss = 0.058519147336483
Validation loss = 0.05839212238788605
Validation loss = 0.05757814645767212
Validation loss = 0.05717109888792038
Validation loss = 0.05712320655584335
Validation loss = 0.05882063880562782
Validation loss = 0.05839383229613304
Validation loss = 0.05799376964569092
Validation loss = 0.058611057698726654
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.1212044209241867
Validation loss = 0.0781640037894249
Validation loss = 0.06884437799453735
Validation loss = 0.06979107856750488
Validation loss = 0.062312982976436615
Validation loss = 0.0691465511918068
Validation loss = 0.06028396263718605
Validation loss = 0.05762632191181183
Validation loss = 0.06118933856487274
Validation loss = 0.06012886017560959
Validation loss = 0.05915221571922302
Validation loss = 0.05824809521436691
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.12377184629440308
Validation loss = 0.07699742913246155
Validation loss = 0.07082546502351761
Validation loss = 0.07106538116931915
Validation loss = 0.06542675197124481
Validation loss = 0.0638025552034378
Validation loss = 0.06241516023874283
Validation loss = 0.06121275573968887
Validation loss = 0.05873265117406845
Validation loss = 0.05829189717769623
Validation loss = 0.06248881295323372
Validation loss = 0.0592009611427784
Validation loss = 0.05674272030591965
Validation loss = 0.059329256415367126
Validation loss = 0.05627420172095299
Validation loss = 0.056883081793785095
Validation loss = 0.053666215389966965
Validation loss = 0.05481169372797012
Validation loss = 0.05834963172674179
Validation loss = 0.05354835093021393
Validation loss = 0.058283381164073944
Validation loss = 0.05357600748538971
Validation loss = 0.05575046315789223
Validation loss = 0.05736803263425827
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -358     |
| Iteration     | 1        |
| MaximumReturn | -170     |
| MinimumReturn | -498     |
| TotalSamples  | 12000    |
----------------------------
itr #2 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.1281418800354004
Validation loss = 0.07755910605192184
Validation loss = 0.07142709940671921
Validation loss = 0.06895995885133743
Validation loss = 0.06927605718374252
Validation loss = 0.06685320287942886
Validation loss = 0.06566418707370758
Validation loss = 0.06636642664670944
Validation loss = 0.06326869130134583
Validation loss = 0.06801575422286987
Validation loss = 0.0660795345902443
Validation loss = 0.06443636864423752
Validation loss = 0.0632467195391655
Validation loss = 0.0634041428565979
Validation loss = 0.06464030593633652
Validation loss = 0.06474801152944565
Validation loss = 0.06730359047651291
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.15425579249858856
Validation loss = 0.07611539959907532
Validation loss = 0.07107088714838028
Validation loss = 0.06705484539270401
Validation loss = 0.06510689854621887
Validation loss = 0.06592495739459991
Validation loss = 0.06431903690099716
Validation loss = 0.07456747442483902
Validation loss = 0.06465855985879898
Validation loss = 0.06286150962114334
Validation loss = 0.06354600191116333
Validation loss = 0.06315978616476059
Validation loss = 0.0628691092133522
Validation loss = 0.06369362026453018
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.13541461527347565
Validation loss = 0.07687951624393463
Validation loss = 0.07147929817438126
Validation loss = 0.06902062892913818
Validation loss = 0.07200300693511963
Validation loss = 0.06694381684064865
Validation loss = 0.0663745179772377
Validation loss = 0.06507211178541183
Validation loss = 0.06579664349555969
Validation loss = 0.06642849743366241
Validation loss = 0.06540783494710922
Validation loss = 0.0660974308848381
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.12501616775989532
Validation loss = 0.07925958186388016
Validation loss = 0.07359426468610764
Validation loss = 0.07194309681653976
Validation loss = 0.07063701003789902
Validation loss = 0.06994938850402832
Validation loss = 0.06623903661966324
Validation loss = 0.0670802965760231
Validation loss = 0.06470990926027298
Validation loss = 0.06797288358211517
Validation loss = 0.06464987248182297
Validation loss = 0.06368344277143478
Validation loss = 0.06762945652008057
Validation loss = 0.06489093601703644
Validation loss = 0.06248531863093376
Validation loss = 0.06463833898305893
Validation loss = 0.06224249303340912
Validation loss = 0.06409641355276108
Validation loss = 0.06848902255296707
Validation loss = 0.062129173427820206
Validation loss = 0.06464574486017227
Validation loss = 0.06406918913125992
Validation loss = 0.060989636927843094
Validation loss = 0.060282737016677856
Validation loss = 0.060809895396232605
Validation loss = 0.06113690137863159
Validation loss = 0.06402134895324707
Validation loss = 0.06418780982494354
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.1332048624753952
Validation loss = 0.07316216081380844
Validation loss = 0.07135066390037537
Validation loss = 0.06769309192895889
Validation loss = 0.06849014759063721
Validation loss = 0.06761735677719116
Validation loss = 0.06664068251848221
Validation loss = 0.07191652804613113
Validation loss = 0.06651587039232254
Validation loss = 0.06478641927242279
Validation loss = 0.0649934783577919
Validation loss = 0.06171464920043945
Validation loss = 0.060782164335250854
Validation loss = 0.06348485499620438
Validation loss = 0.062136854976415634
Validation loss = 0.06155391409993172
Validation loss = 0.06202131509780884
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 47       |
| Iteration     | 2        |
| MaximumReturn | 626      |
| MinimumReturn | -402     |
| TotalSamples  | 16000    |
----------------------------
itr #3 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.08989295363426208
Validation loss = 0.06619244813919067
Validation loss = 0.06831485778093338
Validation loss = 0.06501501798629761
Validation loss = 0.06326085329055786
Validation loss = 0.06483085453510284
Validation loss = 0.06253833323717117
Validation loss = 0.06599891185760498
Validation loss = 0.060722075402736664
Validation loss = 0.062253642827272415
Validation loss = 0.06000341847538948
Validation loss = 0.062130387872457504
Validation loss = 0.06454499810934067
Validation loss = 0.06712400913238525
Validation loss = 0.06008821725845337
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.1076020747423172
Validation loss = 0.06567119807004929
Validation loss = 0.06298303604125977
Validation loss = 0.06399763375520706
Validation loss = 0.06249084696173668
Validation loss = 0.06116648018360138
Validation loss = 0.06107927858829498
Validation loss = 0.05992872640490532
Validation loss = 0.05888807401061058
Validation loss = 0.060353733599185944
Validation loss = 0.05880488082766533
Validation loss = 0.06010562926530838
Validation loss = 0.06013616919517517
Validation loss = 0.05971455201506615
Validation loss = 0.057259779423475266
Validation loss = 0.05929515138268471
Validation loss = 0.0591055192053318
Validation loss = 0.058695148676633835
Validation loss = 0.05924810469150543
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09503042697906494
Validation loss = 0.0699184462428093
Validation loss = 0.06896670907735825
Validation loss = 0.06862165033817291
Validation loss = 0.06411378085613251
Validation loss = 0.06502193212509155
Validation loss = 0.0705786868929863
Validation loss = 0.06505191326141357
Validation loss = 0.06301087141036987
Validation loss = 0.061873823404312134
Validation loss = 0.06167159974575043
Validation loss = 0.06392373144626617
Validation loss = 0.060114145278930664
Validation loss = 0.06174212321639061
Validation loss = 0.060788895934820175
Validation loss = 0.060536518692970276
Validation loss = 0.06014300882816315
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.09930387139320374
Validation loss = 0.06516513228416443
Validation loss = 0.06398332864046097
Validation loss = 0.06116676330566406
Validation loss = 0.060577768832445145
Validation loss = 0.061778631061315536
Validation loss = 0.06201168894767761
Validation loss = 0.06332018226385117
Validation loss = 0.06319505721330643
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0967472568154335
Validation loss = 0.06739221513271332
Validation loss = 0.062628373503685
Validation loss = 0.062282219529151917
Validation loss = 0.062201790511608124
Validation loss = 0.061717063188552856
Validation loss = 0.06107458472251892
Validation loss = 0.07133430987596512
Validation loss = 0.06049426645040512
Validation loss = 0.058941952884197235
Validation loss = 0.05958313122391701
Validation loss = 0.06202626973390579
Validation loss = 0.05866360664367676
Validation loss = 0.060937173664569855
Validation loss = 0.06376336514949799
Validation loss = 0.05854112654924393
Validation loss = 0.05726488679647446
Validation loss = 0.05707644671201706
Validation loss = 0.05853467434644699
Validation loss = 0.05876874923706055
Validation loss = 0.06012129783630371
Validation loss = 0.056667715311050415
Validation loss = 0.0620841458439827
Validation loss = 0.06334789097309113
Validation loss = 0.058545708656311035
Validation loss = 0.05833006650209427
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -105     |
| Iteration     | 3        |
| MaximumReturn | 210      |
| MinimumReturn | -301     |
| TotalSamples  | 20000    |
----------------------------
itr #4 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.09318830072879791
Validation loss = 0.07102536410093307
Validation loss = 0.0713372677564621
Validation loss = 0.06998799741268158
Validation loss = 0.06953974813222885
Validation loss = 0.06784620881080627
Validation loss = 0.06587357074022293
Validation loss = 0.0690419003367424
Validation loss = 0.06504782289266586
Validation loss = 0.0658273845911026
Validation loss = 0.06554727256298065
Validation loss = 0.06696613878011703
Validation loss = 0.0747893825173378
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.10880079120397568
Validation loss = 0.07078317552804947
Validation loss = 0.06769372522830963
Validation loss = 0.06660814583301544
Validation loss = 0.06613173335790634
Validation loss = 0.06372834742069244
Validation loss = 0.06501252949237823
Validation loss = 0.06524216383695602
Validation loss = 0.06496046483516693
Validation loss = 0.06680955737829208
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.09567884355783463
Validation loss = 0.07188335061073303
Validation loss = 0.07176513969898224
Validation loss = 0.0706993043422699
Validation loss = 0.07363814860582352
Validation loss = 0.06892849504947662
Validation loss = 0.06703872978687286
Validation loss = 0.06929521262645721
Validation loss = 0.06960991770029068
Validation loss = 0.0658050924539566
Validation loss = 0.06719635426998138
Validation loss = 0.07187965512275696
Validation loss = 0.06728798896074295
Validation loss = 0.07026780396699905
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.10220052301883698
Validation loss = 0.07392288744449615
Validation loss = 0.06813108921051025
Validation loss = 0.0675613209605217
Validation loss = 0.0704612284898758
Validation loss = 0.06577280163764954
Validation loss = 0.06543312221765518
Validation loss = 0.06628038734197617
Validation loss = 0.06716323643922806
Validation loss = 0.06511269509792328
Validation loss = 0.06597337126731873
Validation loss = 0.06403646618127823
Validation loss = 0.06847703456878662
Validation loss = 0.06354524195194244
Validation loss = 0.06608984619379044
Validation loss = 0.06421501934528351
Validation loss = 0.06414671242237091
Validation loss = 0.06654491275548935
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.10394074767827988
Validation loss = 0.06955035030841827
Validation loss = 0.06608773022890091
Validation loss = 0.06586116552352905
Validation loss = 0.06764589250087738
Validation loss = 0.06503140181303024
Validation loss = 0.064705029129982
Validation loss = 0.07273249328136444
Validation loss = 0.06436918675899506
Validation loss = 0.06889911741018295
Validation loss = 0.06403376162052155
Validation loss = 0.06328240782022476
Validation loss = 0.06408639252185822
Validation loss = 0.06818994879722595
Validation loss = 0.06412079185247421
Validation loss = 0.06503189355134964
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 1.88     |
| Iteration     | 4        |
| MaximumReturn | 268      |
| MinimumReturn | -304     |
| TotalSamples  | 24000    |
----------------------------
itr #5 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07287786900997162
Validation loss = 0.0628557875752449
Validation loss = 0.06138103827834129
Validation loss = 0.06201918423175812
Validation loss = 0.06268656253814697
Validation loss = 0.06279907375574112
Validation loss = 0.058984655886888504
Validation loss = 0.06179216504096985
Validation loss = 0.0612032376229763
Validation loss = 0.06723037362098694
Validation loss = 0.06380685418844223
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.07686731964349747
Validation loss = 0.0608786903321743
Validation loss = 0.05991530790925026
Validation loss = 0.05949753522872925
Validation loss = 0.0593603290617466
Validation loss = 0.05964319407939911
Validation loss = 0.05872895196080208
Validation loss = 0.05915362760424614
Validation loss = 0.05885729193687439
Validation loss = 0.06220186874270439
Validation loss = 0.05740167573094368
Validation loss = 0.0578639954328537
Validation loss = 0.059167951345443726
Validation loss = 0.057682767510414124
Validation loss = 0.057763710618019104
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07912590354681015
Validation loss = 0.06358662992715836
Validation loss = 0.06258076429367065
Validation loss = 0.06257819384336472
Validation loss = 0.061822324991226196
Validation loss = 0.061750929802656174
Validation loss = 0.06280968338251114
Validation loss = 0.06558693200349808
Validation loss = 0.06227665767073631
Validation loss = 0.060522403568029404
Validation loss = 0.06391087919473648
Validation loss = 0.060699332505464554
Validation loss = 0.06207912787795067
Validation loss = 0.05993323028087616
Validation loss = 0.06193739175796509
Validation loss = 0.06480704993009567
Validation loss = 0.06096496060490608
Validation loss = 0.059719327837228775
Validation loss = 0.0619734525680542
Validation loss = 0.06194762513041496
Validation loss = 0.06282137334346771
Validation loss = 0.061183344572782516
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06824570894241333
Validation loss = 0.06174471601843834
Validation loss = 0.060809895396232605
Validation loss = 0.05990809202194214
Validation loss = 0.05921097472310066
Validation loss = 0.06136424466967583
Validation loss = 0.06094896420836449
Validation loss = 0.059503838419914246
Validation loss = 0.05876636877655983
Validation loss = 0.05865543708205223
Validation loss = 0.05941594019532204
Validation loss = 0.0590001605451107
Validation loss = 0.059413403272628784
Validation loss = 0.05847814306616783
Validation loss = 0.05968456342816353
Validation loss = 0.060247331857681274
Validation loss = 0.06406161934137344
Validation loss = 0.05950181186199188
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.07185620069503784
Validation loss = 0.06345673650503159
Validation loss = 0.06172964349389076
Validation loss = 0.05942186713218689
Validation loss = 0.05951525270938873
Validation loss = 0.06090264394879341
Validation loss = 0.05892409384250641
Validation loss = 0.057912811636924744
Validation loss = 0.05833117291331291
Validation loss = 0.05901254713535309
Validation loss = 0.05767069756984711
Validation loss = 0.06290627270936966
Validation loss = 0.06175253912806511
Validation loss = 0.05998820438981056
Validation loss = 0.059520263224840164
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 50.1     |
| Iteration     | 5        |
| MaximumReturn | 577      |
| MinimumReturn | -228     |
| TotalSamples  | 28000    |
----------------------------
itr #6 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.06958577781915665
Validation loss = 0.0633748397231102
Validation loss = 0.06130004674196243
Validation loss = 0.060857661068439484
Validation loss = 0.0618266686797142
Validation loss = 0.05925685167312622
Validation loss = 0.060127515345811844
Validation loss = 0.05960123986005783
Validation loss = 0.062350381165742874
Validation loss = 0.061729319393634796
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.0767521858215332
Validation loss = 0.060260873287916183
Validation loss = 0.05992847681045532
Validation loss = 0.06163563206791878
Validation loss = 0.059834737330675125
Validation loss = 0.0619872584939003
Validation loss = 0.05854317545890808
Validation loss = 0.059496935456991196
Validation loss = 0.06024199351668358
Validation loss = 0.058144159615039825
Validation loss = 0.06222197785973549
Validation loss = 0.05748065561056137
Validation loss = 0.0580010712146759
Validation loss = 0.060464318841695786
Validation loss = 0.05761867016553879
Validation loss = 0.058825600892305374
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.07827668637037277
Validation loss = 0.06311777979135513
Validation loss = 0.0611579492688179
Validation loss = 0.06182102486491203
Validation loss = 0.061414532363414764
Validation loss = 0.06417268514633179
Validation loss = 0.06549634039402008
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0754399374127388
Validation loss = 0.06505479663610458
Validation loss = 0.06114979460835457
Validation loss = 0.061362139880657196
Validation loss = 0.06096024438738823
Validation loss = 0.06139926239848137
Validation loss = 0.059801314026117325
Validation loss = 0.06123200058937073
Validation loss = 0.06049184128642082
Validation loss = 0.06169826537370682
Validation loss = 0.060126833617687225
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06767834722995758
Validation loss = 0.06029896065592766
Validation loss = 0.061715345829725266
Validation loss = 0.05947611480951309
Validation loss = 0.05984002724289894
Validation loss = 0.06031090021133423
Validation loss = 0.059727199375629425
Validation loss = 0.06148047000169754
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 414      |
| Iteration     | 6        |
| MaximumReturn | 1.55e+03 |
| MinimumReturn | -59.2    |
| TotalSamples  | 32000    |
----------------------------
itr #7 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.07346481084823608
Validation loss = 0.056889262050390244
Validation loss = 0.05542880669236183
Validation loss = 0.05423744022846222
Validation loss = 0.05575786530971527
Validation loss = 0.05479254573583603
Validation loss = 0.05430549010634422
Validation loss = 0.05596054345369339
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06931003928184509
Validation loss = 0.05551539361476898
Validation loss = 0.05472317710518837
Validation loss = 0.054464008659124374
Validation loss = 0.05313417315483093
Validation loss = 0.054910808801651
Validation loss = 0.056324806064367294
Validation loss = 0.05317426472902298
Validation loss = 0.05499483644962311
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06914015114307404
Validation loss = 0.05722486972808838
Validation loss = 0.056034669280052185
Validation loss = 0.05691542476415634
Validation loss = 0.05693504214286804
Validation loss = 0.055691227316856384
Validation loss = 0.055923864245414734
Validation loss = 0.05802565813064575
Validation loss = 0.0547662153840065
Validation loss = 0.054931603372097015
Validation loss = 0.05582696944475174
Validation loss = 0.05581633001565933
Validation loss = 0.056371524930000305
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06877557188272476
Validation loss = 0.057150449603796005
Validation loss = 0.05602138489484787
Validation loss = 0.05731572210788727
Validation loss = 0.05538090318441391
Validation loss = 0.05501074343919754
Validation loss = 0.05516894906759262
Validation loss = 0.05382827669382095
Validation loss = 0.05376902222633362
Validation loss = 0.05327384173870087
Validation loss = 0.055955059826374054
Validation loss = 0.057886622846126556
Validation loss = 0.053785763680934906
Validation loss = 0.05452984943985939
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.06659282743930817
Validation loss = 0.057024650275707245
Validation loss = 0.05801508575677872
Validation loss = 0.05594087764620781
Validation loss = 0.054776787757873535
Validation loss = 0.05529971048235893
Validation loss = 0.054314374923706055
Validation loss = 0.05589805543422699
Validation loss = 0.05393766611814499
Validation loss = 0.05590793117880821
Validation loss = 0.054941870272159576
Validation loss = 0.05349055677652359
Validation loss = 0.05422321707010269
Validation loss = 0.05532171204686165
Validation loss = 0.053992703557014465
Validation loss = 0.05433642491698265
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 269      |
| Iteration     | 7        |
| MaximumReturn | 496      |
| MinimumReturn | -94.5    |
| TotalSamples  | 36000    |
----------------------------
itr #8 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.061795394867658615
Validation loss = 0.05361860245466232
Validation loss = 0.05180950462818146
Validation loss = 0.055078260600566864
Validation loss = 0.05395309627056122
Validation loss = 0.05177084356546402
Validation loss = 0.05313750356435776
Validation loss = 0.054649438709020615
Validation loss = 0.0518115758895874
Validation loss = 0.05247874557971954
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.06210923194885254
Validation loss = 0.05388541519641876
Validation loss = 0.051579609513282776
Validation loss = 0.05157877132296562
Validation loss = 0.05079264938831329
Validation loss = 0.05135684087872505
Validation loss = 0.05031069740653038
Validation loss = 0.05046355351805687
Validation loss = 0.05218042805790901
Validation loss = 0.051715318113565445
Validation loss = 0.051830630749464035
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06424184888601303
Validation loss = 0.053433340042829514
Validation loss = 0.054526183754205704
Validation loss = 0.05246256664395332
Validation loss = 0.05251595750451088
Validation loss = 0.053732361644506454
Validation loss = 0.055369481444358826
Validation loss = 0.0516863577067852
Validation loss = 0.053050994873046875
Validation loss = 0.05160140618681908
Validation loss = 0.05322270840406418
Validation loss = 0.051498640328645706
Validation loss = 0.052199043333530426
Validation loss = 0.052272722125053406
Validation loss = 0.05183091014623642
Validation loss = 0.05130411684513092
Validation loss = 0.049042943865060806
Validation loss = 0.05392332375049591
Validation loss = 0.05047890171408653
Validation loss = 0.05031181126832962
Validation loss = 0.051351334899663925
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06778749078512192
Validation loss = 0.052631277590990067
Validation loss = 0.05202137678861618
Validation loss = 0.0548129677772522
Validation loss = 0.05153341591358185
Validation loss = 0.053143203258514404
Validation loss = 0.05279996618628502
Validation loss = 0.05156797170639038
Validation loss = 0.05319710075855255
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0649196207523346
Validation loss = 0.0517946220934391
Validation loss = 0.052072156220674515
Validation loss = 0.05301837623119354
Validation loss = 0.05131448805332184
Validation loss = 0.05090508237481117
Validation loss = 0.052656516432762146
Validation loss = 0.052748601883649826
Validation loss = 0.05078558623790741
Validation loss = 0.05550860986113548
Validation loss = 0.0510190911591053
Validation loss = 0.05005688592791557
Validation loss = 0.051517270505428314
Validation loss = 0.052376698702573776
Validation loss = 0.05005909502506256
Validation loss = 0.05086700618267059
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 378      |
| Iteration     | 8        |
| MaximumReturn | 908      |
| MinimumReturn | -494     |
| TotalSamples  | 40000    |
----------------------------
itr #9 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05701228231191635
Validation loss = 0.050548337399959564
Validation loss = 0.04794328659772873
Validation loss = 0.049181900918483734
Validation loss = 0.04777832701802254
Validation loss = 0.04932789131999016
Validation loss = 0.04898451641201973
Validation loss = 0.049407295882701874
Validation loss = 0.048203978687524796
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.060079287737607956
Validation loss = 0.04857869818806648
Validation loss = 0.04864281043410301
Validation loss = 0.048028357326984406
Validation loss = 0.0483078695833683
Validation loss = 0.04945974051952362
Validation loss = 0.047449298202991486
Validation loss = 0.047671522945165634
Validation loss = 0.047352008521556854
Validation loss = 0.04981658607721329
Validation loss = 0.04636729508638382
Validation loss = 0.04657546430826187
Validation loss = 0.04679443687200546
Validation loss = 0.04944361373782158
Validation loss = 0.04672476276755333
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.06247299164533615
Validation loss = 0.04802113026380539
Validation loss = 0.047355905175209045
Validation loss = 0.04783462733030319
Validation loss = 0.04767964035272598
Validation loss = 0.047225795686244965
Validation loss = 0.04732360318303108
Validation loss = 0.046056635677814484
Validation loss = 0.0466955229640007
Validation loss = 0.046837061643600464
Validation loss = 0.04530531167984009
Validation loss = 0.04829024896025658
Validation loss = 0.04470000043511391
Validation loss = 0.045862212777137756
Validation loss = 0.04492967203259468
Validation loss = 0.0449291467666626
Validation loss = 0.04623689502477646
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.06299027800559998
Validation loss = 0.05038623884320259
Validation loss = 0.04880426824092865
Validation loss = 0.04824268817901611
Validation loss = 0.04853479191660881
Validation loss = 0.04812419041991234
Validation loss = 0.05173414200544357
Validation loss = 0.04875368997454643
Validation loss = 0.046873606741428375
Validation loss = 0.04884060099720955
Validation loss = 0.04763569310307503
Validation loss = 0.048256002366542816
Validation loss = 0.04698168486356735
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.058667730540037155
Validation loss = 0.048018306493759155
Validation loss = 0.04710829257965088
Validation loss = 0.04814988374710083
Validation loss = 0.04734575003385544
Validation loss = 0.0474325567483902
Validation loss = 0.046290166676044464
Validation loss = 0.04859229922294617
Validation loss = 0.04607295244932175
Validation loss = 0.04634793847799301
Validation loss = 0.04883714020252228
Validation loss = 0.04672902077436447
Validation loss = 0.04705774039030075
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -156     |
| Iteration     | 9        |
| MaximumReturn | 16.1     |
| MinimumReturn | -345     |
| TotalSamples  | 44000    |
----------------------------
itr #10 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05881620571017265
Validation loss = 0.048228442668914795
Validation loss = 0.048695389181375504
Validation loss = 0.04787348583340645
Validation loss = 0.048542749136686325
Validation loss = 0.04729418456554413
Validation loss = 0.04744013398885727
Validation loss = 0.0464685894548893
Validation loss = 0.050020597875118256
Validation loss = 0.0446801595389843
Validation loss = 0.046603333204984665
Validation loss = 0.04601716622710228
Validation loss = 0.04722778499126434
Validation loss = 0.047820258885622025
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.057781096547842026
Validation loss = 0.04535872861742973
Validation loss = 0.044656168669462204
Validation loss = 0.04579494893550873
Validation loss = 0.044548023492097855
Validation loss = 0.04608359560370445
Validation loss = 0.04503979906439781
Validation loss = 0.04398205131292343
Validation loss = 0.04535527527332306
Validation loss = 0.04641531780362129
Validation loss = 0.04420778900384903
Validation loss = 0.046782828867435455
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05561235174536705
Validation loss = 0.04470597952604294
Validation loss = 0.04350278899073601
Validation loss = 0.04326903447508812
Validation loss = 0.04333042353391647
Validation loss = 0.045041490346193314
Validation loss = 0.04541320726275444
Validation loss = 0.041668228805065155
Validation loss = 0.04240324720740318
Validation loss = 0.04308159276843071
Validation loss = 0.041794553399086
Validation loss = 0.04309311881661415
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05520040914416313
Validation loss = 0.046982694417238235
Validation loss = 0.045755013823509216
Validation loss = 0.04548865929245949
Validation loss = 0.046465013176202774
Validation loss = 0.04624730348587036
Validation loss = 0.04928850010037422
Validation loss = 0.04528229311108589
Validation loss = 0.053008805960416794
Validation loss = 0.04353160411119461
Validation loss = 0.04410536587238312
Validation loss = 0.044142093509435654
Validation loss = 0.04514243081212044
Validation loss = 0.044776033610105515
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05776815488934517
Validation loss = 0.04662108048796654
Validation loss = 0.0444362610578537
Validation loss = 0.046094056218862534
Validation loss = 0.04459734261035919
Validation loss = 0.044943034648895264
Validation loss = 0.0483841672539711
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 22.6     |
| Iteration     | 10       |
| MaximumReturn | 703      |
| MinimumReturn | -419     |
| TotalSamples  | 48000    |
----------------------------
itr #11 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.0615723542869091
Validation loss = 0.044459354132413864
Validation loss = 0.043141502887010574
Validation loss = 0.04455322027206421
Validation loss = 0.04556366801261902
Validation loss = 0.044957857578992844
Validation loss = 0.043647974729537964
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.052774179726839066
Validation loss = 0.043825309723615646
Validation loss = 0.0436166375875473
Validation loss = 0.04419034346938133
Validation loss = 0.04428596794605255
Validation loss = 0.042875949293375015
Validation loss = 0.04291711375117302
Validation loss = 0.042647283524274826
Validation loss = 0.04255519434809685
Validation loss = 0.04260430857539177
Validation loss = 0.042358387261629105
Validation loss = 0.04293040931224823
Validation loss = 0.04374558851122856
Validation loss = 0.041280347853899
Validation loss = 0.04194680228829384
Validation loss = 0.045195166021585464
Validation loss = 0.039980873465538025
Validation loss = 0.04208654165267944
Validation loss = 0.042397186160087585
Validation loss = 0.040283262729644775
Validation loss = 0.0439860075712204
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.053217340260744095
Validation loss = 0.04137112572789192
Validation loss = 0.04098730906844139
Validation loss = 0.04199356213212013
Validation loss = 0.043303072452545166
Validation loss = 0.04258464649319649
Validation loss = 0.04032248258590698
Validation loss = 0.04140595719218254
Validation loss = 0.04451841115951538
Validation loss = 0.04164605215191841
Validation loss = 0.03897830471396446
Validation loss = 0.04097241908311844
Validation loss = 0.04089380428195
Validation loss = 0.04028908163309097
Validation loss = 0.041552718728780746
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.051812876015901566
Validation loss = 0.04312295839190483
Validation loss = 0.04375748336315155
Validation loss = 0.042735595256090164
Validation loss = 0.04353887960314751
Validation loss = 0.042702656239271164
Validation loss = 0.04253804311156273
Validation loss = 0.04332892224192619
Validation loss = 0.040998898446559906
Validation loss = 0.0412103496491909
Validation loss = 0.043814074248075485
Validation loss = 0.042681772261857986
Validation loss = 0.04034494236111641
Validation loss = 0.04119545593857765
Validation loss = 0.04212983325123787
Validation loss = 0.04683597758412361
Validation loss = 0.03957168012857437
Validation loss = 0.04062683507800102
Validation loss = 0.04029195010662079
Validation loss = 0.041934531182050705
Validation loss = 0.03956782445311546
Validation loss = 0.0424349308013916
Validation loss = 0.040719229727983475
Validation loss = 0.03965943306684494
Validation loss = 0.04074221849441528
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.051830414682626724
Validation loss = 0.04503488168120384
Validation loss = 0.043325018137693405
Validation loss = 0.04423627257347107
Validation loss = 0.04235932230949402
Validation loss = 0.045654505491256714
Validation loss = 0.04427489638328552
Validation loss = 0.04282623529434204
Validation loss = 0.04546363279223442
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -343     |
| Iteration     | 11       |
| MaximumReturn | -202     |
| MinimumReturn | -504     |
| TotalSamples  | 52000    |
----------------------------
itr #12 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.05126505717635155
Validation loss = 0.04313652217388153
Validation loss = 0.041666850447654724
Validation loss = 0.04406522214412689
Validation loss = 0.04481882229447365
Validation loss = 0.04170234501361847
Validation loss = 0.04204098880290985
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.05085569620132446
Validation loss = 0.04170028865337372
Validation loss = 0.039671801030635834
Validation loss = 0.044187966734170914
Validation loss = 0.04208358749747276
Validation loss = 0.03934507071971893
Validation loss = 0.04429793357849121
Validation loss = 0.0395195297896862
Validation loss = 0.03928768262267113
Validation loss = 0.045554157346487045
Validation loss = 0.03820532560348511
Validation loss = 0.039145078510046005
Validation loss = 0.0422278456389904
Validation loss = 0.038143184036016464
Validation loss = 0.03795706108212471
Validation loss = 0.04230739548802376
Validation loss = 0.0381711944937706
Validation loss = 0.03914312273263931
Validation loss = 0.04315945878624916
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.05271851271390915
Validation loss = 0.03914334997534752
Validation loss = 0.039184026420116425
Validation loss = 0.040839605033397675
Validation loss = 0.03915301337838173
Validation loss = 0.04016806557774544
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.05368213355541229
Validation loss = 0.041122399270534515
Validation loss = 0.03814062848687172
Validation loss = 0.04150456562638283
Validation loss = 0.04029832035303116
Validation loss = 0.03909001499414444
Validation loss = 0.04160481318831444
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.05065621808171272
Validation loss = 0.0429241843521595
Validation loss = 0.04171498492360115
Validation loss = 0.044364504516124725
Validation loss = 0.0445915050804615
Validation loss = 0.041318003088235855
Validation loss = 0.04415202885866165
Validation loss = 0.04280145838856697
Validation loss = 0.04179292172193527
Validation loss = 0.04376640543341637
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -121     |
| Iteration     | 12       |
| MaximumReturn | 297      |
| MinimumReturn | -384     |
| TotalSamples  | 56000    |
----------------------------
itr #13 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.052752383053302765
Validation loss = 0.04081765189766884
Validation loss = 0.042413000017404556
Validation loss = 0.04087400436401367
Validation loss = 0.04089951142668724
Validation loss = 0.04041910171508789
Validation loss = 0.04111407324671745
Validation loss = 0.0389348566532135
Validation loss = 0.04159637913107872
Validation loss = 0.03847942128777504
Validation loss = 0.038038261234760284
Validation loss = 0.0403699092566967
Validation loss = 0.04013385996222496
Validation loss = 0.037671830505132675
Validation loss = 0.04238724336028099
Validation loss = 0.03749579191207886
Validation loss = 0.03774357959628105
Validation loss = 0.04055812954902649
Validation loss = 0.0387040451169014
Validation loss = 0.03726312518119812
Validation loss = 0.04143953323364258
Validation loss = 0.03669902682304382
Validation loss = 0.03970308229327202
Validation loss = 0.037957314401865005
Validation loss = 0.0371544249355793
Validation loss = 0.043507177382707596
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.04451267048716545
Validation loss = 0.03643972799181938
Validation loss = 0.036653101444244385
Validation loss = 0.0388195738196373
Validation loss = 0.03587217256426811
Validation loss = 0.03880852088332176
Validation loss = 0.03614369034767151
Validation loss = 0.03610886260867119
Validation loss = 0.03667276352643967
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04790365695953369
Validation loss = 0.038645874708890915
Validation loss = 0.03748897835612297
Validation loss = 0.03713906928896904
Validation loss = 0.03826135769486427
Validation loss = 0.04370696097612381
Validation loss = 0.035710409283638
Validation loss = 0.03687078878283501
Validation loss = 0.037488412111997604
Validation loss = 0.03759797662496567
Validation loss = 0.03720274195075035
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04846115782856941
Validation loss = 0.038238774985075
Validation loss = 0.036496248096227646
Validation loss = 0.037721745669841766
Validation loss = 0.03978725150227547
Validation loss = 0.03656943514943123
Validation loss = 0.03588289022445679
Validation loss = 0.04034910351037979
Validation loss = 0.03616150841116905
Validation loss = 0.03627028316259384
Validation loss = 0.04133247211575508
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.049235470592975616
Validation loss = 0.03949164226651192
Validation loss = 0.03864491358399391
Validation loss = 0.039920490235090256
Validation loss = 0.040093936026096344
Validation loss = 0.041974958032369614
Validation loss = 0.038289133459329605
Validation loss = 0.038412272930145264
Validation loss = 0.040814079344272614
Validation loss = 0.03781361132860184
Validation loss = 0.03937836363911629
Validation loss = 0.040731556713581085
Validation loss = 0.03642449527978897
Validation loss = 0.044289372861385345
Validation loss = 0.03727034851908684
Validation loss = 0.037309445440769196
Validation loss = 0.039376407861709595
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 342      |
| Iteration     | 13       |
| MaximumReturn | 1.47e+03 |
| MinimumReturn | -426     |
| TotalSamples  | 60000    |
----------------------------
itr #14 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04380763694643974
Validation loss = 0.03639555722475052
Validation loss = 0.035161860287189484
Validation loss = 0.03594619780778885
Validation loss = 0.03661995753645897
Validation loss = 0.035485707223415375
Validation loss = 0.036884356290102005
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.042934756726026535
Validation loss = 0.03527326136827469
Validation loss = 0.03490494564175606
Validation loss = 0.03567255288362503
Validation loss = 0.034244898706674576
Validation loss = 0.03527448698878288
Validation loss = 0.03530330955982208
Validation loss = 0.034178536385297775
Validation loss = 0.03738883510231972
Validation loss = 0.032845038920640945
Validation loss = 0.03273710981011391
Validation loss = 0.0353272520005703
Validation loss = 0.033531833440065384
Validation loss = 0.03235499560832977
Validation loss = 0.03527722507715225
Validation loss = 0.03236853703856468
Validation loss = 0.03365646302700043
Validation loss = 0.035467613488435745
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.044064320623874664
Validation loss = 0.03487927094101906
Validation loss = 0.035291869193315506
Validation loss = 0.03503870218992233
Validation loss = 0.03469468280673027
Validation loss = 0.040989652276039124
Validation loss = 0.03361194208264351
Validation loss = 0.035023849457502365
Validation loss = 0.03497178852558136
Validation loss = 0.034675296396017075
Validation loss = 0.03972896188497543
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04291244223713875
Validation loss = 0.03562137857079506
Validation loss = 0.034598227590322495
Validation loss = 0.03790642321109772
Validation loss = 0.035746339708566666
Validation loss = 0.035901062190532684
Validation loss = 0.0346265509724617
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04401589557528496
Validation loss = 0.03611798211932182
Validation loss = 0.035930339246988297
Validation loss = 0.03888705000281334
Validation loss = 0.036711059510707855
Validation loss = 0.035673193633556366
Validation loss = 0.038360897451639175
Validation loss = 0.034867774695158005
Validation loss = 0.037034109234809875
Validation loss = 0.03496265038847923
Validation loss = 0.034426286816596985
Validation loss = 0.03842882066965103
Validation loss = 0.03412000462412834
Validation loss = 0.03507494181394577
Validation loss = 0.03850819543004036
Validation loss = 0.03322746604681015
Validation loss = 0.0335853286087513
Validation loss = 0.040023453533649445
Validation loss = 0.03304819390177727
Validation loss = 0.0344119630753994
Validation loss = 0.037449490278959274
Validation loss = 0.0328848734498024
Validation loss = 0.0362161248922348
Validation loss = 0.034293364733457565
Validation loss = 0.032204609364271164
Validation loss = 0.035624682903289795
Validation loss = 0.03403749316930771
Validation loss = 0.032108988612890244
Validation loss = 0.03445044532418251
Validation loss = 0.03288688138127327
Validation loss = 0.03232063725590706
Validation loss = 0.035164568573236465
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -32.9    |
| Iteration     | 14       |
| MaximumReturn | 551      |
| MinimumReturn | -619     |
| TotalSamples  | 64000    |
----------------------------
itr #15 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04473240673542023
Validation loss = 0.03598052263259888
Validation loss = 0.03609283268451691
Validation loss = 0.03547310456633568
Validation loss = 0.03584027662873268
Validation loss = 0.03484175726771355
Validation loss = 0.037981610745191574
Validation loss = 0.03446897119283676
Validation loss = 0.03499528393149376
Validation loss = 0.03742308169603348
Validation loss = 0.033585235476493835
Validation loss = 0.036943092942237854
Validation loss = 0.03719022870063782
Validation loss = 0.03279219940304756
Validation loss = 0.035912103950977325
Validation loss = 0.03283306956291199
Validation loss = 0.0330209843814373
Validation loss = 0.03717593476176262
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03966892510652542
Validation loss = 0.033418960869312286
Validation loss = 0.03208783641457558
Validation loss = 0.0358862429857254
Validation loss = 0.03272285312414169
Validation loss = 0.03277973085641861
Validation loss = 0.034761179238557816
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.04514914005994797
Validation loss = 0.03580273687839508
Validation loss = 0.035112034529447556
Validation loss = 0.037267740815877914
Validation loss = 0.03584694117307663
Validation loss = 0.03463468700647354
Validation loss = 0.034564901143312454
Validation loss = 0.03469104319810867
Validation loss = 0.03327216953039169
Validation loss = 0.0385865792632103
Validation loss = 0.0336289182305336
Validation loss = 0.03277923911809921
Validation loss = 0.04083190858364105
Validation loss = 0.032573819160461426
Validation loss = 0.035097576677799225
Validation loss = 0.0376998707652092
Validation loss = 0.0345941036939621
Validation loss = 0.03233930841088295
Validation loss = 0.0367119163274765
Validation loss = 0.03298913687467575
Validation loss = 0.03225947916507721
Validation loss = 0.03998897969722748
Validation loss = 0.031838949769735336
Validation loss = 0.0333457887172699
Validation loss = 0.03327784687280655
Validation loss = 0.037502534687519073
Validation loss = 0.0319594144821167
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.04540001228451729
Validation loss = 0.03630901500582695
Validation loss = 0.03446124121546745
Validation loss = 0.03642468526959419
Validation loss = 0.039912737905979156
Validation loss = 0.03360959142446518
Validation loss = 0.03443071246147156
Validation loss = 0.037088602781295776
Validation loss = 0.033504024147987366
Validation loss = 0.03509531915187836
Validation loss = 0.03933492302894592
Validation loss = 0.0328390896320343
Validation loss = 0.03425024077296257
Validation loss = 0.035440415143966675
Validation loss = 0.0330120250582695
Validation loss = 0.03705187141895294
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.04184669256210327
Validation loss = 0.03365827351808548
Validation loss = 0.032735005021095276
Validation loss = 0.03453395888209343
Validation loss = 0.03459484502673149
Validation loss = 0.03249248117208481
Validation loss = 0.03569142520427704
Validation loss = 0.03278886526823044
Validation loss = 0.031409263610839844
Validation loss = 0.037190698087215424
Validation loss = 0.03138760104775429
Validation loss = 0.033680260181427
Validation loss = 0.03551097214221954
Validation loss = 0.030930669978260994
Validation loss = 0.03208126500248909
Validation loss = 0.03387083485722542
Validation loss = 0.030615732073783875
Validation loss = 0.03404247760772705
Validation loss = 0.03356068581342697
Validation loss = 0.03055914118885994
Validation loss = 0.03287632018327713
Validation loss = 0.03199335187673569
Validation loss = 0.03019176609814167
Validation loss = 0.03114638477563858
Validation loss = 0.03557747229933739
Validation loss = 0.029548585414886475
Validation loss = 0.031274497509002686
Validation loss = 0.031788039952516556
Validation loss = 0.02935892529785633
Validation loss = 0.03117513656616211
Validation loss = 0.03090456873178482
Validation loss = 0.029269332066178322
Validation loss = 0.03079938143491745
Validation loss = 0.030497537925839424
Validation loss = 0.02913598157465458
Validation loss = 0.03327881917357445
Validation loss = 0.02959606982767582
Validation loss = 0.028569817543029785
Validation loss = 0.03038814105093479
Validation loss = 0.030619841068983078
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 18.9     |
| Iteration     | 15       |
| MaximumReturn | 430      |
| MinimumReturn | -257     |
| TotalSamples  | 68000    |
----------------------------
itr #16 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.04199758917093277
Validation loss = 0.03467230126261711
Validation loss = 0.03214266896247864
Validation loss = 0.034226685762405396
Validation loss = 0.03235538676381111
Validation loss = 0.03244445100426674
Validation loss = 0.03664587065577507
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.040104907006025314
Validation loss = 0.032286886125802994
Validation loss = 0.03230294585227966
Validation loss = 0.033045582473278046
Validation loss = 0.03439803048968315
Validation loss = 0.031371891498565674
Validation loss = 0.03096138685941696
Validation loss = 0.03643624112010002
Validation loss = 0.02996699884533882
Validation loss = 0.031025270000100136
Validation loss = 0.03273764252662659
Validation loss = 0.02991360053420067
Validation loss = 0.0305938683450222
Validation loss = 0.032618287950754166
Validation loss = 0.029371777549386024
Validation loss = 0.03117087297141552
Validation loss = 0.03407469764351845
Validation loss = 0.028990482911467552
Validation loss = 0.02963772974908352
Validation loss = 0.032224882394075394
Validation loss = 0.028547493740916252
Validation loss = 0.031019028276205063
Validation loss = 0.030824054032564163
Validation loss = 0.028919832780957222
Validation loss = 0.03284040838479996
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03877643123269081
Validation loss = 0.033075571060180664
Validation loss = 0.031512778252363205
Validation loss = 0.03495128080248833
Validation loss = 0.03139277175068855
Validation loss = 0.03211166337132454
Validation loss = 0.036089032888412476
Validation loss = 0.030696284025907516
Validation loss = 0.032090991735458374
Validation loss = 0.035363804548978806
Validation loss = 0.03060411661863327
Validation loss = 0.03591027110815048
Validation loss = 0.031026527285575867
Validation loss = 0.030755039304494858
Validation loss = 0.03488476946949959
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.041080426424741745
Validation loss = 0.033655837178230286
Validation loss = 0.03250512480735779
Validation loss = 0.03640797734260559
Validation loss = 0.032254625111818314
Validation loss = 0.03295888751745224
Validation loss = 0.03549712151288986
Validation loss = 0.03127451241016388
Validation loss = 0.0348215252161026
Validation loss = 0.03226754441857338
Validation loss = 0.031106026843190193
Validation loss = 0.03486332669854164
Validation loss = 0.03118286468088627
Validation loss = 0.032908178865909576
Validation loss = 0.0377146378159523
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03397803008556366
Validation loss = 0.029631730169057846
Validation loss = 0.02866211161017418
Validation loss = 0.031631797552108765
Validation loss = 0.028730014339089394
Validation loss = 0.02844063565135002
Validation loss = 0.03208816424012184
Validation loss = 0.02829519286751747
Validation loss = 0.0278609786182642
Validation loss = 0.02935856766998768
Validation loss = 0.029745634645223618
Validation loss = 0.027576273307204247
Validation loss = 0.03242598474025726
Validation loss = 0.027523372322320938
Validation loss = 0.027908269315958023
Validation loss = 0.031337182968854904
Validation loss = 0.027652723714709282
Validation loss = 0.026889672502875328
Validation loss = 0.03122858516871929
Validation loss = 0.026586294174194336
Validation loss = 0.027236333116889
Validation loss = 0.03229047730565071
Validation loss = 0.026629619300365448
Validation loss = 0.02700502797961235
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 252      |
| Iteration     | 16       |
| MaximumReturn | 927      |
| MinimumReturn | -498     |
| TotalSamples  | 72000    |
----------------------------
itr #17 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03965019807219505
Validation loss = 0.03271117061376572
Validation loss = 0.03339596092700958
Validation loss = 0.03393807262182236
Validation loss = 0.03199993446469307
Validation loss = 0.033285729587078094
Validation loss = 0.033120766282081604
Validation loss = 0.031100818887352943
Validation loss = 0.03610013425350189
Validation loss = 0.030862292274832726
Validation loss = 0.03177579119801521
Validation loss = 0.032209623605012894
Validation loss = 0.030292175710201263
Validation loss = 0.03410503268241882
Validation loss = 0.03183411806821823
Validation loss = 0.0304034985601902
Validation loss = 0.03306296095252037
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03585617244243622
Validation loss = 0.029446134343743324
Validation loss = 0.029309673234820366
Validation loss = 0.03258495032787323
Validation loss = 0.028678754344582558
Validation loss = 0.030096927657723427
Validation loss = 0.02933318167924881
Validation loss = 0.02876610867679119
Validation loss = 0.032702717930078506
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03949151933193207
Validation loss = 0.03159267082810402
Validation loss = 0.031484078615903854
Validation loss = 0.0351068489253521
Validation loss = 0.03090846538543701
Validation loss = 0.03191138803958893
Validation loss = 0.03355802968144417
Validation loss = 0.030365265905857086
Validation loss = 0.03214721009135246
Validation loss = 0.032236404716968536
Validation loss = 0.0301361046731472
Validation loss = 0.03704991564154625
Validation loss = 0.030156562104821205
Validation loss = 0.030172983184456825
Validation loss = 0.03507867828011513
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.037056148052215576
Validation loss = 0.03165296092629433
Validation loss = 0.031713493168354034
Validation loss = 0.035283975303173065
Validation loss = 0.031364355236291885
Validation loss = 0.03070296160876751
Validation loss = 0.035620708018541336
Validation loss = 0.030506126582622528
Validation loss = 0.030681021511554718
Validation loss = 0.03715534508228302
Validation loss = 0.030376706272363663
Validation loss = 0.031193044036626816
Validation loss = 0.034543540328741074
Validation loss = 0.029539385810494423
Validation loss = 0.029749764129519463
Validation loss = 0.03364460915327072
Validation loss = 0.029481645673513412
Validation loss = 0.033340934664011
Validation loss = 0.029773814603686333
Validation loss = 0.02883382886648178
Validation loss = 0.03514320030808449
Validation loss = 0.028794484212994576
Validation loss = 0.02879195474088192
Validation loss = 0.03190530836582184
Validation loss = 0.02857258915901184
Validation loss = 0.036620281636714935
Validation loss = 0.028422292321920395
Validation loss = 0.028784357011318207
Validation loss = 0.032800812274217606
Validation loss = 0.028263546526432037
Validation loss = 0.028599752113223076
Validation loss = 0.03136584535241127
Validation loss = 0.027599429711699486
Validation loss = 0.029280662536621094
Validation loss = 0.030401594936847687
Validation loss = 0.027146626263856888
Validation loss = 0.03031720407307148
Validation loss = 0.028082920238375664
Validation loss = 0.027479596436023712
Validation loss = 0.03106553666293621
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03459109365940094
Validation loss = 0.026491528376936913
Validation loss = 0.027099747210741043
Validation loss = 0.029090140014886856
Validation loss = 0.026022054255008698
Validation loss = 0.02927357703447342
Validation loss = 0.02697930857539177
Validation loss = 0.02579130046069622
Validation loss = 0.02919785864651203
Validation loss = 0.026406459510326385
Validation loss = 0.02607504092156887
Validation loss = 0.02947390079498291
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 160      |
| Iteration     | 17       |
| MaximumReturn | 653      |
| MinimumReturn | -261     |
| TotalSamples  | 76000    |
----------------------------
itr #18 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03492863476276398
Validation loss = 0.030371204018592834
Validation loss = 0.031858932226896286
Validation loss = 0.031563807278871536
Validation loss = 0.030036550015211105
Validation loss = 0.035621121525764465
Validation loss = 0.0294988714158535
Validation loss = 0.030108602717518806
Validation loss = 0.03395271673798561
Validation loss = 0.028847143054008484
Validation loss = 0.031094038859009743
Validation loss = 0.03149110823869705
Validation loss = 0.028527773916721344
Validation loss = 0.032132118940353394
Validation loss = 0.028797708451747894
Validation loss = 0.028995519503951073
Validation loss = 0.031646717339754105
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03458603844046593
Validation loss = 0.02841285988688469
Validation loss = 0.028828905895352364
Validation loss = 0.03262242674827576
Validation loss = 0.02813079208135605
Validation loss = 0.029768310487270355
Validation loss = 0.029390297830104828
Validation loss = 0.02763824351131916
Validation loss = 0.03366130217909813
Validation loss = 0.027399884536862373
Validation loss = 0.027976108714938164
Validation loss = 0.0297201219946146
Validation loss = 0.027402663603425026
Validation loss = 0.03230435028672218
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03536472097039223
Validation loss = 0.030684446915984154
Validation loss = 0.030760113149881363
Validation loss = 0.032045044004917145
Validation loss = 0.03019563853740692
Validation loss = 0.0324440635740757
Validation loss = 0.03099873475730419
Validation loss = 0.02938785031437874
Validation loss = 0.03142248094081879
Validation loss = 0.029072269797325134
Validation loss = 0.029052043333649635
Validation loss = 0.03248080611228943
Validation loss = 0.028710879385471344
Validation loss = 0.03044186532497406
Validation loss = 0.03173195570707321
Validation loss = 0.029257282614707947
Validation loss = 0.03128167986869812
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.034408289939165115
Validation loss = 0.027863427996635437
Validation loss = 0.02749503031373024
Validation loss = 0.029255643486976624
Validation loss = 0.029175588861107826
Validation loss = 0.027625134214758873
Validation loss = 0.029116345569491386
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.03343624249100685
Validation loss = 0.025964675471186638
Validation loss = 0.02642146311700344
Validation loss = 0.02878591977059841
Validation loss = 0.02594102919101715
Validation loss = 0.026181967929005623
Validation loss = 0.027450254186987877
Validation loss = 0.024807630106806755
Validation loss = 0.02630557119846344
Validation loss = 0.02588511072099209
Validation loss = 0.025060569867491722
Validation loss = 0.029035385698080063
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | -127     |
| Iteration     | 18       |
| MaximumReturn | 752      |
| MinimumReturn | -507     |
| TotalSamples  | 80000    |
----------------------------
itr #19 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03287089988589287
Validation loss = 0.028718626126646996
Validation loss = 0.02907506190240383
Validation loss = 0.02996504306793213
Validation loss = 0.02807137370109558
Validation loss = 0.03137734532356262
Validation loss = 0.02774423360824585
Validation loss = 0.03127385303378105
Validation loss = 0.027828287333250046
Validation loss = 0.028857845813035965
Validation loss = 0.028618356212973595
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.034434638917446136
Validation loss = 0.028486032038927078
Validation loss = 0.02818894013762474
Validation loss = 0.0296357162296772
Validation loss = 0.02681305631995201
Validation loss = 0.02762696146965027
Validation loss = 0.02918732538819313
Validation loss = 0.026139864698052406
Validation loss = 0.027352657169103622
Validation loss = 0.0287191029638052
Validation loss = 0.025919565930962563
Validation loss = 0.02821500040590763
Validation loss = 0.027592414990067482
Validation loss = 0.026066649705171585
Validation loss = 0.030377810820937157
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.035442788153886795
Validation loss = 0.0294013861566782
Validation loss = 0.02885914407670498
Validation loss = 0.03147929906845093
Validation loss = 0.028690647333860397
Validation loss = 0.029897475615143776
Validation loss = 0.03092922270298004
Validation loss = 0.028905102983117104
Validation loss = 0.027819300070405006
Validation loss = 0.03554249554872513
Validation loss = 0.027196455746889114
Validation loss = 0.029069039970636368
Validation loss = 0.028071671724319458
Validation loss = 0.027807112783193588
Validation loss = 0.030711567029356956
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03339344635605812
Validation loss = 0.027074739336967468
Validation loss = 0.02753007970750332
Validation loss = 0.029723871499300003
Validation loss = 0.028384849429130554
Validation loss = 0.026368314400315285
Validation loss = 0.029469508677721024
Validation loss = 0.02659226953983307
Validation loss = 0.02626696228981018
Validation loss = 0.03136970475316048
Validation loss = 0.025185275822877884
Validation loss = 0.026456153020262718
Validation loss = 0.027623098343610764
Validation loss = 0.025728266686201096
Validation loss = 0.026999443769454956
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029521584510803223
Validation loss = 0.024355057626962662
Validation loss = 0.024502458050847054
Validation loss = 0.03107299469411373
Validation loss = 0.024492565542459488
Validation loss = 0.024654369801282883
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 349      |
| Iteration     | 19       |
| MaximumReturn | 1.21e+03 |
| MinimumReturn | -685     |
| TotalSamples  | 84000    |
----------------------------
itr #20 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03432244807481766
Validation loss = 0.030065063387155533
Validation loss = 0.029860708862543106
Validation loss = 0.03210541605949402
Validation loss = 0.028997860848903656
Validation loss = 0.033391937613487244
Validation loss = 0.028584225103259087
Validation loss = 0.02967410907149315
Validation loss = 0.028917593881487846
Validation loss = 0.028159216046333313
Validation loss = 0.03190397098660469
Validation loss = 0.02816486917436123
Validation loss = 0.028792472556233406
Validation loss = 0.030502453446388245
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.033517591655254364
Validation loss = 0.02843533456325531
Validation loss = 0.028224050998687744
Validation loss = 0.03190385550260544
Validation loss = 0.027311166748404503
Validation loss = 0.029439816251397133
Validation loss = 0.03063369356095791
Validation loss = 0.02744183875620365
Validation loss = 0.029533276334404945
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03383435681462288
Validation loss = 0.028423337265849113
Validation loss = 0.028644481673836708
Validation loss = 0.032411545515060425
Validation loss = 0.028316369280219078
Validation loss = 0.031580325216054916
Validation loss = 0.028455069288611412
Validation loss = 0.02838129736483097
Validation loss = 0.030661199241876602
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.03578215092420578
Validation loss = 0.027911826968193054
Validation loss = 0.027176205068826675
Validation loss = 0.030299196019768715
Validation loss = 0.026962820440530777
Validation loss = 0.028713693842291832
Validation loss = 0.02722308225929737
Validation loss = 0.027078362181782722
Validation loss = 0.02920827455818653
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.032623130828142166
Validation loss = 0.026709118857979774
Validation loss = 0.025912024080753326
Validation loss = 0.026713397353887558
Validation loss = 0.02644346095621586
Validation loss = 0.02819167450070381
Validation loss = 0.025503013283014297
Validation loss = 0.025550570338964462
Validation loss = 0.0295439250767231
Validation loss = 0.02445172145962715
Validation loss = 0.02531312219798565
Validation loss = 0.02843374013900757
Validation loss = 0.024457670748233795
Validation loss = 0.024838197976350784
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 83.1     |
| Iteration     | 20       |
| MaximumReturn | 560      |
| MinimumReturn | -243     |
| TotalSamples  | 88000    |
----------------------------
itr #21 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03273076191544533
Validation loss = 0.028472237288951874
Validation loss = 0.032722171396017075
Validation loss = 0.028031719848513603
Validation loss = 0.030136603862047195
Validation loss = 0.027038151398301125
Validation loss = 0.02904200740158558
Validation loss = 0.02906118705868721
Validation loss = 0.026986323297023773
Validation loss = 0.02949725277721882
Validation loss = 0.026842348277568817
Validation loss = 0.029625527560710907
Validation loss = 0.027394337579607964
Validation loss = 0.02683059126138687
Validation loss = 0.028841739520430565
Validation loss = 0.02573392353951931
Validation loss = 0.028175877407193184
Validation loss = 0.026118643581867218
Validation loss = 0.026631198823451996
Validation loss = 0.02990797720849514
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03435380756855011
Validation loss = 0.028697935864329338
Validation loss = 0.028950659558176994
Validation loss = 0.03040137141942978
Validation loss = 0.027762912213802338
Validation loss = 0.0326385535299778
Validation loss = 0.027959752827882767
Validation loss = 0.029023755341768265
Validation loss = 0.02932935394346714
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03412756696343422
Validation loss = 0.02935224026441574
Validation loss = 0.028475690633058548
Validation loss = 0.03269759193062782
Validation loss = 0.02838584966957569
Validation loss = 0.03059866465628147
Validation loss = 0.028997361660003662
Validation loss = 0.027620943263173103
Validation loss = 0.03262846916913986
Validation loss = 0.027138354256749153
Validation loss = 0.02798338420689106
Validation loss = 0.029430877417325974
Validation loss = 0.027277382090687752
Validation loss = 0.03420163318514824
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.033655475825071335
Validation loss = 0.027406197041273117
Validation loss = 0.027440553531050682
Validation loss = 0.028244392946362495
Validation loss = 0.026091450825333595
Validation loss = 0.03269839286804199
Validation loss = 0.026130307465791702
Validation loss = 0.02846350148320198
Validation loss = 0.027751147747039795
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.029800139367580414
Validation loss = 0.025974275544285774
Validation loss = 0.02449585124850273
Validation loss = 0.028329627588391304
Validation loss = 0.0245943833142519
Validation loss = 0.02643621526658535
Validation loss = 0.024694478139281273
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 128      |
| Iteration     | 21       |
| MaximumReturn | 629      |
| MinimumReturn | -440     |
| TotalSamples  | 92000    |
----------------------------
itr #22 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.03095495142042637
Validation loss = 0.02469758875668049
Validation loss = 0.02530958317220211
Validation loss = 0.02514195628464222
Validation loss = 0.024151774123311043
Validation loss = 0.02727973647415638
Validation loss = 0.023930614814162254
Validation loss = 0.023239055648446083
Validation loss = 0.02704739011824131
Validation loss = 0.02363511547446251
Validation loss = 0.025884855538606644
Validation loss = 0.02497270330786705
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.03056708537042141
Validation loss = 0.02648983895778656
Validation loss = 0.027918631210923195
Validation loss = 0.026537630707025528
Validation loss = 0.02562308870255947
Validation loss = 0.029530657455325127
Validation loss = 0.02531576342880726
Validation loss = 0.025594517588615417
Validation loss = 0.026611439883708954
Validation loss = 0.024837177246809006
Validation loss = 0.02693222090601921
Validation loss = 0.024252302944660187
Validation loss = 0.025064049288630486
Validation loss = 0.028557974845170975
Validation loss = 0.02349708043038845
Validation loss = 0.025816624984145164
Validation loss = 0.02419492043554783
Validation loss = 0.024707918986678123
Validation loss = 0.024226918816566467
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.03088429756462574
Validation loss = 0.026742955669760704
Validation loss = 0.026504233479499817
Validation loss = 0.02859591878950596
Validation loss = 0.025686610490083694
Validation loss = 0.02730271779000759
Validation loss = 0.026043767109513283
Validation loss = 0.028681376948952675
Validation loss = 0.026130830869078636
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028306325897574425
Validation loss = 0.025325100868940353
Validation loss = 0.026573307812213898
Validation loss = 0.02503475733101368
Validation loss = 0.024611134082078934
Validation loss = 0.026441680267453194
Validation loss = 0.02367568202316761
Validation loss = 0.02550666406750679
Validation loss = 0.0240748543292284
Validation loss = 0.02773381397128105
Validation loss = 0.02307598479092121
Validation loss = 0.024122444912791252
Validation loss = 0.02385219745337963
Validation loss = 0.02292967587709427
Validation loss = 0.025651903823018074
Validation loss = 0.022572537884116173
Validation loss = 0.025009693577885628
Validation loss = 0.02323533594608307
Validation loss = 0.022714493796229362
Validation loss = 0.02502167783677578
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025296390056610107
Validation loss = 0.023589536547660828
Validation loss = 0.023368429392576218
Validation loss = 0.026694120839238167
Validation loss = 0.02280084416270256
Validation loss = 0.02818704955279827
Validation loss = 0.02262936159968376
Validation loss = 0.02284158393740654
Validation loss = 0.024593083187937737
Validation loss = 0.022387145087122917
Validation loss = 0.023255515843629837
Validation loss = 0.02393452264368534
Validation loss = 0.021701809018850327
Validation loss = 0.02433338388800621
Validation loss = 0.02198825776576996
Validation loss = 0.022036690264940262
Validation loss = 0.026257963851094246
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 419      |
| Iteration     | 22       |
| MaximumReturn | 1.21e+03 |
| MinimumReturn | -249     |
| TotalSamples  | 96000    |
----------------------------
itr #23 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.029723316431045532
Validation loss = 0.023720795288681984
Validation loss = 0.02381006069481373
Validation loss = 0.025611385703086853
Validation loss = 0.022617915645241737
Validation loss = 0.024703403934836388
Validation loss = 0.023401355370879173
Validation loss = 0.022717414423823357
Validation loss = 0.025248998776078224
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026871008798480034
Validation loss = 0.024433350190520287
Validation loss = 0.028823593631386757
Validation loss = 0.02357473224401474
Validation loss = 0.025381380692124367
Validation loss = 0.02351875603199005
Validation loss = 0.025086669251322746
Validation loss = 0.022750498726963997
Validation loss = 0.028023414313793182
Validation loss = 0.022566035389900208
Validation loss = 0.022857487201690674
Validation loss = 0.02418549358844757
Validation loss = 0.02207251451909542
Validation loss = 0.02569323219358921
Validation loss = 0.02201968990266323
Validation loss = 0.021959060803055763
Validation loss = 0.025977352634072304
Validation loss = 0.022152043879032135
Validation loss = 0.02166103385388851
Validation loss = 0.02346550114452839
Validation loss = 0.02179834060370922
Validation loss = 0.022720985114574432
Validation loss = 0.021992141380906105
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02921566367149353
Validation loss = 0.02581227570772171
Validation loss = 0.0270596444606781
Validation loss = 0.025568628683686256
Validation loss = 0.025806019082665443
Validation loss = 0.028488995507359505
Validation loss = 0.024716710671782494
Validation loss = 0.03050142526626587
Validation loss = 0.02445083297789097
Validation loss = 0.0253799706697464
Validation loss = 0.025807151570916176
Validation loss = 0.023909682407975197
Validation loss = 0.028890026733279228
Validation loss = 0.024494150653481483
Validation loss = 0.02544722519814968
Validation loss = 0.025014275684952736
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.028576204553246498
Validation loss = 0.022716330364346504
Validation loss = 0.02212371863424778
Validation loss = 0.023751044645905495
Validation loss = 0.022785449400544167
Validation loss = 0.02209903858602047
Validation loss = 0.02303369902074337
Validation loss = 0.023754099383950233
Validation loss = 0.021180590614676476
Validation loss = 0.02444332279264927
Validation loss = 0.02122342772781849
Validation loss = 0.022136598825454712
Validation loss = 0.02272573858499527
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.026420122012495995
Validation loss = 0.021636247634887695
Validation loss = 0.02198967896401882
Validation loss = 0.0227460116147995
Validation loss = 0.021940067410469055
Validation loss = 0.021062517538666725
Validation loss = 0.024417055770754814
Validation loss = 0.021317878738045692
Validation loss = 0.021291682496666908
Validation loss = 0.021322863176465034
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 550      |
| Iteration     | 23       |
| MaximumReturn | 1.61e+03 |
| MinimumReturn | 33.7     |
| TotalSamples  | 100000   |
----------------------------
itr #24 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.025842122733592987
Validation loss = 0.02215307019650936
Validation loss = 0.02313055843114853
Validation loss = 0.022434677928686142
Validation loss = 0.02257203869521618
Validation loss = 0.02325817383825779
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02545197494328022
Validation loss = 0.020810933783650398
Validation loss = 0.021726027131080627
Validation loss = 0.02246205136179924
Validation loss = 0.021381070837378502
Validation loss = 0.02537582442164421
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02942536398768425
Validation loss = 0.023586295545101166
Validation loss = 0.025835342705249786
Validation loss = 0.02431839145720005
Validation loss = 0.023357100784778595
Validation loss = 0.026585103943943977
Validation loss = 0.022965850308537483
Validation loss = 0.025318406522274017
Validation loss = 0.02235715091228485
Validation loss = 0.023511851206421852
Validation loss = 0.026093238964676857
Validation loss = 0.02189985103905201
Validation loss = 0.023619748651981354
Validation loss = 0.02213066630065441
Validation loss = 0.0219902154058218
Validation loss = 0.024546198546886444
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02443564683198929
Validation loss = 0.02071954309940338
Validation loss = 0.023238131776452065
Validation loss = 0.02048053964972496
Validation loss = 0.02152005210518837
Validation loss = 0.022191930562257767
Validation loss = 0.02057776227593422
Validation loss = 0.0242027398198843
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.022984836250543594
Validation loss = 0.020586993545293808
Validation loss = 0.022380691021680832
Validation loss = 0.020828377455472946
Validation loss = 0.02194553054869175
Validation loss = 0.020830199122428894
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 496      |
| Iteration     | 24       |
| MaximumReturn | 1.24e+03 |
| MinimumReturn | -193     |
| TotalSamples  | 104000   |
----------------------------
itr #25 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.023560725152492523
Validation loss = 0.021126501262187958
Validation loss = 0.02319912426173687
Validation loss = 0.020793845877051353
Validation loss = 0.026670753955841064
Validation loss = 0.02063068374991417
Validation loss = 0.02284284494817257
Validation loss = 0.021853001788258553
Validation loss = 0.021351289004087448
Validation loss = 0.0213856790214777
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02335651032626629
Validation loss = 0.020617663860321045
Validation loss = 0.02264176867902279
Validation loss = 0.020280400291085243
Validation loss = 0.021538985893130302
Validation loss = 0.02108113281428814
Validation loss = 0.019815821200609207
Validation loss = 0.021605877205729485
Validation loss = 0.01942622661590576
Validation loss = 0.022282104939222336
Validation loss = 0.019239794462919235
Validation loss = 0.022203044965863228
Validation loss = 0.018987758085131645
Validation loss = 0.02131170779466629
Validation loss = 0.02127230353653431
Validation loss = 0.0186624638736248
Validation loss = 0.021457325667142868
Validation loss = 0.018812857568264008
Validation loss = 0.01904447190463543
Validation loss = 0.020592451095581055
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02389737218618393
Validation loss = 0.021204950287938118
Validation loss = 0.02375687099993229
Validation loss = 0.021295009180903435
Validation loss = 0.021578950807452202
Validation loss = 0.023212365806102753
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.023869682103395462
Validation loss = 0.019747678190469742
Validation loss = 0.020469605922698975
Validation loss = 0.02092481404542923
Validation loss = 0.02071922831237316
Validation loss = 0.019946502521634102
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0216890349984169
Validation loss = 0.019961627200245857
Validation loss = 0.021750517189502716
Validation loss = 0.01937985233962536
Validation loss = 0.020815828815102577
Validation loss = 0.019380778074264526
Validation loss = 0.02263314463198185
Validation loss = 0.0192275270819664
Validation loss = 0.022203190252184868
Validation loss = 0.019687553867697716
Validation loss = 0.019069988280534744
Validation loss = 0.020465610548853874
Validation loss = 0.018957026302814484
Validation loss = 0.023453952744603157
Validation loss = 0.01850961521267891
Validation loss = 0.020562373101711273
Validation loss = 0.018556511029601097
Validation loss = 0.018390368670225143
Validation loss = 0.02215060219168663
Validation loss = 0.018388282507658005
Validation loss = 0.0203857384622097
Validation loss = 0.020690612494945526
Validation loss = 0.01785431243479252
Validation loss = 0.02285904623568058
Validation loss = 0.017680242657661438
Validation loss = 0.019421877339482307
Validation loss = 0.01959029585123062
Validation loss = 0.01799633353948593
Validation loss = 0.02104920893907547
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 482      |
| Iteration     | 25       |
| MaximumReturn | 1.01e+03 |
| MinimumReturn | 102      |
| TotalSamples  | 108000   |
----------------------------
itr #26 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.024654578417539597
Validation loss = 0.021001290529966354
Validation loss = 0.022655146196484566
Validation loss = 0.021205665543675423
Validation loss = 0.020679673179984093
Validation loss = 0.022092297673225403
Validation loss = 0.020246662199497223
Validation loss = 0.023252610117197037
Validation loss = 0.0200492050498724
Validation loss = 0.020766517147421837
Validation loss = 0.019735367968678474
Validation loss = 0.021516215056180954
Validation loss = 0.019745517522096634
Validation loss = 0.02049681916832924
Validation loss = 0.02101006545126438
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023238012567162514
Validation loss = 0.018835077062249184
Validation loss = 0.02080344222486019
Validation loss = 0.021363606676459312
Validation loss = 0.019308626651763916
Validation loss = 0.020597543567419052
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.02407257817685604
Validation loss = 0.02134779654443264
Validation loss = 0.023185590282082558
Validation loss = 0.021546801552176476
Validation loss = 0.02155124768614769
Validation loss = 0.022170916199684143
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.024579688906669617
Validation loss = 0.02014598809182644
Validation loss = 0.023850781843066216
Validation loss = 0.019791914150118828
Validation loss = 0.020219584926962852
Validation loss = 0.020902244374155998
Validation loss = 0.01951616071164608
Validation loss = 0.022767046466469765
Validation loss = 0.019958725199103355
Validation loss = 0.02351674996316433
Validation loss = 0.019558390602469444
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021791258826851845
Validation loss = 0.018053201958537102
Validation loss = 0.020402276888489723
Validation loss = 0.018192367628216743
Validation loss = 0.023235300555825233
Validation loss = 0.017949365079402924
Validation loss = 0.0187054593116045
Validation loss = 0.018450580537319183
Validation loss = 0.018754569813609123
Validation loss = 0.02013777382671833
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 167      |
| Iteration     | 26       |
| MaximumReturn | 1.19e+03 |
| MinimumReturn | -536     |
| TotalSamples  | 112000   |
----------------------------
itr #27 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02373061515390873
Validation loss = 0.020314501598477364
Validation loss = 0.020996063947677612
Validation loss = 0.02024482563138008
Validation loss = 0.019408375024795532
Validation loss = 0.02140565775334835
Validation loss = 0.01931760087609291
Validation loss = 0.020638683810830116
Validation loss = 0.01892298087477684
Validation loss = 0.021939892321825027
Validation loss = 0.018882248550653458
Validation loss = 0.024852903559803963
Validation loss = 0.018719444051384926
Validation loss = 0.021335741505026817
Validation loss = 0.018604854121804237
Validation loss = 0.02230142243206501
Validation loss = 0.018501073122024536
Validation loss = 0.020131155848503113
Validation loss = 0.01951122097671032
Validation loss = 0.02228347770869732
Validation loss = 0.017905524000525475
Validation loss = 0.019700782373547554
Validation loss = 0.018198980018496513
Validation loss = 0.019346097484230995
Validation loss = 0.019062144681811333
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.025669489055871964
Validation loss = 0.018979281187057495
Validation loss = 0.01941581256687641
Validation loss = 0.01976926438510418
Validation loss = 0.019066235050559044
Validation loss = 0.02043125033378601
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.023631613701581955
Validation loss = 0.021110696718096733
Validation loss = 0.023983512073755264
Validation loss = 0.020715925842523575
Validation loss = 0.02381310984492302
Validation loss = 0.020524267107248306
Validation loss = 0.021064775064587593
Validation loss = 0.02101295255124569
Validation loss = 0.020584655925631523
Validation loss = 0.02046610601246357
Validation loss = 0.02204655110836029
Validation loss = 0.019957084208726883
Validation loss = 0.022260401397943497
Validation loss = 0.019865592941641808
Validation loss = 0.02415384352207184
Validation loss = 0.01932593807578087
Validation loss = 0.022457854822278023
Validation loss = 0.019536178559064865
Validation loss = 0.01988143101334572
Validation loss = 0.021095862612128258
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.022708913311362267
Validation loss = 0.020508231595158577
Validation loss = 0.021765628829598427
Validation loss = 0.019072476774454117
Validation loss = 0.021157333627343178
Validation loss = 0.02025526389479637
Validation loss = 0.019036104902625084
Validation loss = 0.023867672309279442
Validation loss = 0.018541792407631874
Validation loss = 0.02245732769370079
Validation loss = 0.019284797832369804
Validation loss = 0.020337512716650963
Validation loss = 0.01889568381011486
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.021641960367560387
Validation loss = 0.018626490607857704
Validation loss = 0.018977243453264236
Validation loss = 0.018837986513972282
Validation loss = 0.01856047287583351
Validation loss = 0.019671322777867317
Validation loss = 0.017985360696911812
Validation loss = 0.01972995512187481
Validation loss = 0.018907684832811356
Validation loss = 0.017991572618484497
Validation loss = 0.01975175365805626
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 517      |
| Iteration     | 27       |
| MaximumReturn | 1.08e+03 |
| MinimumReturn | -2.51    |
| TotalSamples  | 116000   |
----------------------------
itr #28 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.020689142867922783
Validation loss = 0.01844361238181591
Validation loss = 0.02062971517443657
Validation loss = 0.019592944532632828
Validation loss = 0.01803104393184185
Validation loss = 0.019667532294988632
Validation loss = 0.019229941070079803
Validation loss = 0.01872418448328972
Validation loss = 0.018171146512031555
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02183898538351059
Validation loss = 0.01860501989722252
Validation loss = 0.01882096379995346
Validation loss = 0.021123841404914856
Validation loss = 0.01812920719385147
Validation loss = 0.02150062844157219
Validation loss = 0.01803717203438282
Validation loss = 0.019674362614750862
Validation loss = 0.018644513562321663
Validation loss = 0.01835341937839985
Validation loss = 0.01857380010187626
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.021425487473607063
Validation loss = 0.01930585317313671
Validation loss = 0.020670531317591667
Validation loss = 0.01918482594192028
Validation loss = 0.022365283221006393
Validation loss = 0.018900984898209572
Validation loss = 0.022732242941856384
Validation loss = 0.01889869198203087
Validation loss = 0.02304825372993946
Validation loss = 0.018685035407543182
Validation loss = 0.02158745564520359
Validation loss = 0.020402615889906883
Validation loss = 0.019232943654060364
Validation loss = 0.020790239796042442
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.025175156071782112
Validation loss = 0.019389895722270012
Validation loss = 0.02328798733651638
Validation loss = 0.019054774194955826
Validation loss = 0.025463873520493507
Validation loss = 0.01813598908483982
Validation loss = 0.02120770327746868
Validation loss = 0.01839286834001541
Validation loss = 0.022793369367718697
Validation loss = 0.018143650144338608
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019078576937317848
Validation loss = 0.017392165958881378
Validation loss = 0.019884547218680382
Validation loss = 0.017509300261735916
Validation loss = 0.01845947466790676
Validation loss = 0.01784181222319603
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 490      |
| Iteration     | 28       |
| MaximumReturn | 1.07e+03 |
| MinimumReturn | -151     |
| TotalSamples  | 120000   |
----------------------------
itr #29 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.019198451191186905
Validation loss = 0.018404848873615265
Validation loss = 0.019104501232504845
Validation loss = 0.018048850819468498
Validation loss = 0.021571796387434006
Validation loss = 0.01748615689575672
Validation loss = 0.022034713998436928
Validation loss = 0.01727372594177723
Validation loss = 0.018611490726470947
Validation loss = 0.01800406165421009
Validation loss = 0.01767868921160698
Validation loss = 0.021452385932207108
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.020756620913743973
Validation loss = 0.018146617338061333
Validation loss = 0.020844871178269386
Validation loss = 0.01818803697824478
Validation loss = 0.021081173792481422
Validation loss = 0.017539653927087784
Validation loss = 0.019899290055036545
Validation loss = 0.017302071675658226
Validation loss = 0.02010818012058735
Validation loss = 0.017790289595723152
Validation loss = 0.020068306475877762
Validation loss = 0.017316073179244995
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.022144315764307976
Validation loss = 0.01944972202181816
Validation loss = 0.02119988389313221
Validation loss = 0.018673086538910866
Validation loss = 0.021852008998394012
Validation loss = 0.019020384177565575
Validation loss = 0.02003561332821846
Validation loss = 0.018981317058205605
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.02173829823732376
Validation loss = 0.019415313377976418
Validation loss = 0.020457498729228973
Validation loss = 0.018326779827475548
Validation loss = 0.021257342770695686
Validation loss = 0.018342383205890656
Validation loss = 0.01887582801282406
Validation loss = 0.01969972625374794
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.019793162122368813
Validation loss = 0.018020303919911385
Validation loss = 0.020575495436787605
Validation loss = 0.017593801021575928
Validation loss = 0.019888974726200104
Validation loss = 0.016973918303847313
Validation loss = 0.020011305809020996
Validation loss = 0.017640026286244392
Validation loss = 0.02003898285329342
Validation loss = 0.017685161903500557
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 555      |
| Iteration     | 29       |
| MaximumReturn | 919      |
| MinimumReturn | -147     |
| TotalSamples  | 124000   |
----------------------------
itr #30 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.022856099531054497
Validation loss = 0.020553382113575935
Validation loss = 0.021969102323055267
Validation loss = 0.02190210297703743
Validation loss = 0.021204210817813873
Validation loss = 0.022511372342705727
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.026041416451334953
Validation loss = 0.022210121154785156
Validation loss = 0.02579403482377529
Validation loss = 0.021592851728200912
Validation loss = 0.02483569271862507
Validation loss = 0.022204644978046417
Validation loss = 0.023888448253273964
Validation loss = 0.021809086203575134
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.028031524270772934
Validation loss = 0.023628637194633484
Validation loss = 0.02436726540327072
Validation loss = 0.02399553917348385
Validation loss = 0.02480912208557129
Validation loss = 0.02280195988714695
Validation loss = 0.02332673780620098
Validation loss = 0.023640260100364685
Validation loss = 0.024593990296125412
Validation loss = 0.02228519134223461
Validation loss = 0.02487126924097538
Validation loss = 0.023618580773472786
Validation loss = 0.02446885034441948
Validation loss = 0.026611540466547012
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.030361663550138474
Validation loss = 0.023997541517019272
Validation loss = 0.025822972878813744
Validation loss = 0.023166341707110405
Validation loss = 0.025193456560373306
Validation loss = 0.02436692640185356
Validation loss = 0.025829104706645012
Validation loss = 0.023932956159114838
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.0242928434163332
Validation loss = 0.025227203965187073
Validation loss = 0.0255229901522398
Validation loss = 0.022060329094529152
Validation loss = 0.026321176439523697
Validation loss = 0.022816311568021774
Validation loss = 0.025945132598280907
Validation loss = 0.023351086303591728
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 449      |
| Iteration     | 30       |
| MaximumReturn | 1.36e+03 |
| MinimumReturn | -251     |
| TotalSamples  | 128000   |
----------------------------
itr #31 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.021749261766672134
Validation loss = 0.022189553827047348
Validation loss = 0.020873311907052994
Validation loss = 0.021051958203315735
Validation loss = 0.020667430013418198
Validation loss = 0.02298716828227043
Validation loss = 0.019942276179790497
Validation loss = 0.021558862179517746
Validation loss = 0.021015457808971405
Validation loss = 0.01971542462706566
Validation loss = 0.02093946933746338
Validation loss = 0.020037386566400528
Validation loss = 0.02069002389907837
Validation loss = 0.020433422178030014
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.023274218663573265
Validation loss = 0.022335005924105644
Validation loss = 0.023462872952222824
Validation loss = 0.02192113921046257
Validation loss = 0.024741308763623238
Validation loss = 0.021299634128808975
Validation loss = 0.021988555788993835
Validation loss = 0.020777568221092224
Validation loss = 0.022847238928079605
Validation loss = 0.02095431461930275
Validation loss = 0.02126372419297695
Validation loss = 0.021047458052635193
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025571612641215324
Validation loss = 0.022144541144371033
Validation loss = 0.02449062280356884
Validation loss = 0.022254351526498795
Validation loss = 0.02733396738767624
Validation loss = 0.021808985620737076
Validation loss = 0.02413194440305233
Validation loss = 0.023297205567359924
Validation loss = 0.02257736399769783
Validation loss = 0.022876866161823273
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.026920568197965622
Validation loss = 0.0233878493309021
Validation loss = 0.023959491401910782
Validation loss = 0.022236717864871025
Validation loss = 0.023774510249495506
Validation loss = 0.025808468461036682
Validation loss = 0.024560118094086647
Validation loss = 0.024565525352954865
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.025444619357585907
Validation loss = 0.02715490013360977
Validation loss = 0.02768927998840809
Validation loss = 0.024703089147806168
Validation loss = 0.026659071445465088
Validation loss = 0.022645363584160805
Validation loss = 0.026855112984776497
Validation loss = 0.026393571868538857
Validation loss = 0.025876039639115334
Validation loss = 0.02643747255206108
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 606      |
| Iteration     | 31       |
| MaximumReturn | 2.14e+03 |
| MinimumReturn | -254     |
| TotalSamples  | 132000   |
----------------------------
itr #32 | 
Fitting dynamics.
Fitting model 0 (0-based) in the ensemble of 5 models
Validation loss = 0.02108975499868393
Validation loss = 0.02000255696475506
Validation loss = 0.024094460532069206
Validation loss = 0.020125379785895348
Validation loss = 0.02062140591442585
Validation loss = 0.02116427570581436
Fitting model 1 (0-based) in the ensemble of 5 models
Validation loss = 0.02568891830742359
Validation loss = 0.022118166089057922
Validation loss = 0.022529086098074913
Validation loss = 0.020413072779774666
Validation loss = 0.021721072494983673
Validation loss = 0.021168535575270653
Validation loss = 0.02250378206372261
Validation loss = 0.021129371598362923
Fitting model 2 (0-based) in the ensemble of 5 models
Validation loss = 0.025380898267030716
Validation loss = 0.020936332643032074
Validation loss = 0.02267037145793438
Validation loss = 0.022213328629732132
Validation loss = 0.02253328077495098
Validation loss = 0.022719990462064743
Fitting model 3 (0-based) in the ensemble of 5 models
Validation loss = 0.0254270788282156
Validation loss = 0.021473538130521774
Validation loss = 0.02434568852186203
Validation loss = 0.02268850989639759
Validation loss = 0.026023000478744507
Validation loss = 0.02249612659215927
Fitting model 4 (0-based) in the ensemble of 5 models
Validation loss = 0.028059182688593864
Validation loss = 0.023936418816447258
Validation loss = 0.026324015110731125
Validation loss = 0.025133272632956505
Validation loss = 0.024865375831723213
Validation loss = 0.02347516641020775
Validation loss = 0.029666900634765625
Validation loss = 0.027737759053707123
Validation loss = 0.023182278499007225
Validation loss = 0.026181666180491447
Validation loss = 0.02897966466844082
Validation loss = 0.027112899348139763
Validation loss = 0.028108928352594376
Done fitting dynamics.
Updating randomness.
Done updating randomness.
Training policy using TRPO.
Re-initialize init_std.
Obtaining samples for iteration 0...
Obtaining samples for iteration 1...
Obtaining samples for iteration 2...
Obtaining samples for iteration 3...
Obtaining samples for iteration 4...
Obtaining samples for iteration 5...
Obtaining samples for iteration 6...
Obtaining samples for iteration 7...
Obtaining samples for iteration 8...
Obtaining samples for iteration 9...
Obtaining samples for iteration 10...
Obtaining samples for iteration 11...
Obtaining samples for iteration 12...
Obtaining samples for iteration 13...
Obtaining samples for iteration 14...
Obtaining samples for iteration 15...
Obtaining samples for iteration 16...
Obtaining samples for iteration 17...
Obtaining samples for iteration 18...
Obtaining samples for iteration 19...
Obtaining samples for iteration 20...
Obtaining samples for iteration 21...
Obtaining samples for iteration 22...
Obtaining samples for iteration 23...
Obtaining samples for iteration 24...
Obtaining samples for iteration 25...
Obtaining samples for iteration 26...
Obtaining samples for iteration 27...
Obtaining samples for iteration 28...
Obtaining samples for iteration 29...
Obtaining samples for iteration 30...
Obtaining samples for iteration 31...
Obtaining samples for iteration 32...
Obtaining samples for iteration 33...
Obtaining samples for iteration 34...
Obtaining samples for iteration 35...
Obtaining samples for iteration 36...
Obtaining samples for iteration 37...
Obtaining samples for iteration 38...
Obtaining samples for iteration 39...
Done training policy.
Generating on-policy rollouts.
Path 0 | total_timesteps 0.
Path 1 | total_timesteps 1000.
Path 2 | total_timesteps 2000.
Path 3 | total_timesteps 3000.
Path 4 | total_timesteps 4000.
Path 5 | total_timesteps 5000.
Done generating on-policy rollouts.
Updating normalization.
Done updating normalization.
----------------------------
| AverageReturn | 967      |
| Iteration     | 32       |
| MaximumReturn | 2.14e+03 |
| MinimumReturn | 494      |
| TotalSamples  | 136000   |
----------------------------
